Skip to content

generator: add simonw/llm library support#1633

Open
Nakul-Rajpal wants to merge 6 commits intoNVIDIA:mainfrom
Nakul-Rajpal:generator-wrapLLM-v2
Open

generator: add simonw/llm library support#1633
Nakul-Rajpal wants to merge 6 commits intoNVIDIA:mainfrom
Nakul-Rajpal:generator-wrapLLM-v2

Conversation

@Nakul-Rajpal
Copy link
Contributor

@Nakul-Rajpal Nakul-Rajpal commented Mar 4, 2026

Fixes issue #1382 → second PR on new branch with updated code

Three files changed:

  • docs/source/garak.generators.llm.rst: Sphinx autodoc stub for the new LLM generator module
  • garak/generators/llm.py: New generator wrapping Simon Willison's llm library, supporting OpenAI, Claude, Gemini, and other providers via plugins
  • tests/generators/test_llm.py: Unit tests for LLMGenerator covering instantiation, parameter passthrough, error handling, and multiple generations

Verification

  • pip install llm
  • garak -m llm -n gpt-4o-mini
  • Run the tests and ensure they pass python -m pytest tests/generators/test_llm.py
  • Verify generation works with at least one llm-supported model
  • Verify error handling when llm package is not installed
  • Document the generator and how it works

--
resolves #463

@Nakul-Rajpal
Copy link
Contributor Author

@jmartin-tech

@jmartin-tech jmartin-tech changed the title Rebuilt PR from clean main with ONLY LLM generator changes generator: add simonw/llm library support Mar 4, 2026
@Nakul-Rajpal
Copy link
Contributor Author

@jmartin-tech is this branch ready to merge?

@leondz leondz requested review from jmartin-tech and leondz March 14, 2026 18:18
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nakul-Rajpal this rebase looks to have been AI generated from the previous PR and needs closer author review and testing.

Using coding assistants to aid in contribution to this project is considered acceptable, this PR does not meet the project needs in its current form.

Contributors must test and evaluate any contribution in the spirit of expanding the capabilities of the core project and align with design expectations for naming conventions and separation of responsibilities in the codebase.

This PR did not properly rebased on the upstream NVIDIA:main branch and needs further alignment with upstream patterns. Review of the branch shows the rebase was on 711718c which is from late Aug 2025.

Testing shows this PR currently ends up in an endless backoff loop when the target configuration is not viable for the llm environment such as when the target model name is not available or a parameter is not supported to pass through to the target model.

Once updated there will need to be additional testing for errors reported by the llm library.

Note also previous feedback in PR #1382 related to accepted_params based on enumerating the parameters for the target are missing from this iteration.

This PR is also missing updates to requirements.txt and pyproject.toml that are required for the generator to load.

Comment on lines +70 to +79
def _clear_client(self) -> None:
self.target = None

def __getstate__(self) -> object:
self._clear_client()
return dict(self.__dict__)

def __setstate__(self, data: dict) -> None:
self.__dict__.update(data)
self._load_client()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current upstream main these are no longer needed as they are now handled by the Configurable class updates in #1545.

self.__dict__.update(data)
self._load_client()

@backoff.on_exception(backoff.fibo, Exception, max_value=70)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backoff on generic Exception is highly susceptible to creating an infinite loop, please update to capture more limited exceptions following the patterns in #1199 to capture valid types especially when based on deferred loading.

Comment on lines +54 to +59
self._load_client()

def _load_client(self) -> None:
import llm as llm_lib

self.llm = llm_lib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to _load_unsafe() to match the current pattern from main. Also extra_dependencies are loaded by base.Probe.__init__() prior calling _load_unsafe():

Suggested change
self._load_client()
def _load_client(self) -> None:
import llm as llm_lib
self.llm = llm_lib
self._load_unsafe()
def _load_unsafe(self) -> None:

@Nakul-Rajpal Nakul-Rajpal force-pushed the generator-wrapLLM-v2 branch from eb378cf to 29a7f9f Compare March 17, 2026 20:05
@Nakul-Rajpal
Copy link
Contributor Author

@jmartin-tech These comments make sense. Here were additional issues which I found after code changes and testing.

Before fixing the code itself, I properly rebased onto the current REMOTE upstream main

  1. Since I hadn't tested locally (I didnt install the packages, and it didnt automatically add to requirements/pyproject.toml) I added the LLM>=0.12 to the requirements.txt

  2. My naming conventions as previously mentioned were not aligned with those of similar generators int he code base. I renamed _load_client() to _load_unsafe() and removed the custom getstate and setstate because the commands were handled by the configurable.

  3. I removed the unused _supported_params and kept_enumerate_model_params() to check which parameters each model actually supports before it is passed through.

  4. I added _unsafe_attributes = ["target"] which the base class nullifies the target during pickling and recreates it with the _load_unsafe() method on deserialization.

  5. I fixed the infinite loop. The @backoff.on_exception(..., ...) got replaced with GeneratorBackoffTrigger. ModelError and rate-limit errors now trigger the retry, while UnknownModelError and BadRequestError fail immediately or return None.


I then tested locally and made sure that

  1. Using a fake model name would fail the script immediately without a backoff loop

  2. test.Blank would not crash the script

  3. when I ran on an actual script (in this case dan.DanInTheWild) against a sample model (gpt-4o-mini) that it completed fully. It reported 942/1280 which is aligned with that models capabilities.

  4. I also updated all syntax to be inline with garak requirements (python3.10+)

Comment on lines +125 to +128
except self.llm.ModelError as e:
raise GeneratorBackoffTrigger from e
except Exception as e:
backoff_types = ("RateLimitError", "APITimeoutError", "APIConnectionError")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all self.llm.ModelError exceptions should just retry then a single exception handler is viable:

            except Exception as e:
                backoff_types = (self.llm.ModelError.__name__, "RateLimitError", "APITimeoutError", "APIConnectionError")

However looking at llm.ModelError it has a sinlge subtype that should terminate the inference attempts resulting in reason to keep the more focused handler and filter, raising GeneratorBackoffTrigger only when a retry has potential to result in valid completion.

Something like:

          except self.llm.NeedsKeyException as e:
              logging.error("llm generation failed: %s", repr(e))
              outputs.append(None)
          except Exception as e:
              backoff_types = (self.llm.ModelError, "RateLimitError", "APITimeoutError", "APIConnectionError")

For the NeedsKeyException case however I am not sure there is any chance of recovery or progress for the entire run so it may actually be more appropriate to allow that exception to bubble up:

          except self.llm.ModelError as e:
              raise e
          except Exception as e:
              backoff_types = (self.llm.ModelError, "RateLimitError", "APITimeoutError", "APIConnectionError")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperated the two cases so that NeedsKeyException is caught first and returns None. The base model error uses isinstance and raises generator backoff trigger.

Comment on lines +119 to +134
for _ in range(generations_this_call):
try:
response = self.target.prompt(
text_prompt, system=system_text, **prompt_kwargs
)
outputs.append(Message(response.text()))
except self.llm.ModelError as e:
raise GeneratorBackoffTrigger from e
except Exception as e:
backoff_types = ("RateLimitError", "APITimeoutError", "APIConnectionError")
exc_name = type(e).__name__
if exc_name in backoff_types:
raise GeneratorBackoffTrigger from e
logging.error("llm generation failed: %s", repr(e))
outputs.append(None)
return outputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backoff action looks to be scoped too broadly. If an exception were to trigger backoff here on anything other than the first iteration of the loop then completed inference outputs would be abandoned and a new set of generations_this_call inference requests would be made.

I would suggest that the code inside this loop is the actually backoff boundary the code should be refactored to account for this. Something like:

    @backoff.on_exception(backoff.fibo, GeneratorBackoffTrigger, max_value=70)
    def _get_response(text_prompt, system, prompt_kwargs):
        try:
            response = self.target.prompt(
                text_prompt, system=system_text, **prompt_kwargs
            )
            return Message(response.text()))
        except self.llm.NeedsKeyException as e:
            logging.error("llm generation failed: %s", repr(e))
            return None
        except Exception as e:
            backoff_types = (self.llm.ModelError.__name__, "RateLimitError", "APITimeoutError", "APIConnectionError")
            exc_name = type(e).__name__
            if exc_name in backoff_types:
                raise GeneratorBackoffTrigger from e
            logging.error("llm generation failed: %s", repr(e))
        return None
        for _ in range(generations_this_call):
            response = self._get_response(text_prompt, system=system_text, **prompt_kwargs)
            outputs.append(response)
        return outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the retry to _call_single so that each generation gets its own independent retry.

Comment on lines +50 to +54
self.name = name
self._load_config(config_root)
self.fullname = f"llm:{self.name}"

super().__init__(self.name, config_root=config_root)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason to load config early here?

        super().__init__(name, config_root=config_root)
        self.fullname = f"llm:{self.name}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no reason to do that; now i am passing the name directly to super().init().

Comment on lines +129 to +131
exc_name = type(e).__name__
if exc_name in backoff_types:
raise GeneratorBackoffTrigger from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this based on string name values? I would expect to be able to use the types themselves similar to how the cohere generator does the handling.

See:

except Exception as e:
backoff_exception_types = (
self.cohere.errors.GatewayTimeoutError,
self.cohere.errors.TooManyRequestsError,
self.cohere.errors.ServiceUnavailableError,
self.cohere.errors.InternalServerError,
)
for backoff_exception in backoff_exception_types:
if isinstance(e, backoff_exception):
raise GeneratorBackoffTrigger from e # bubble up ApiError for backoff handling
logging.error(f"Chat API error: {e}")
responses.append(None)

Are there multiple types with the same __name__ result that could be raised here but still provide a viable case to retry the request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now im using isinstance(e, self.llm.ModelError) , which is the only thing that triggers a retry otherwise we log and return None.

Comment on lines +38 to +39
"top_p": None,
"stop": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding support similar to the OpenAICompatible and LiteLLM generators to suppress or extend options passed to this proxy library.

In particular suppressed_params and extra_params would enable control of additional options to the inference request that may vary depending on the target. These options would need to be incorporated into the logic of _enumerate_model_params().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i now added supressed_params and extra_params to the default params. It should now match the pattern in OpenAiCompatible and LiteLLM.

…xtra_params to default_params

Signed-off-by: Nakul Rajpal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

generator: datasette / llm

2 participants