generator: add simonw/llm library support#1633
generator: add simonw/llm library support#1633Nakul-Rajpal wants to merge 6 commits intoNVIDIA:mainfrom
simonw/llm library support#1633Conversation
simonw/llm library support
|
@jmartin-tech is this branch ready to merge? |
jmartin-tech
left a comment
There was a problem hiding this comment.
@Nakul-Rajpal this rebase looks to have been AI generated from the previous PR and needs closer author review and testing.
Using coding assistants to aid in contribution to this project is considered acceptable, this PR does not meet the project needs in its current form.
Contributors must test and evaluate any contribution in the spirit of expanding the capabilities of the core project and align with design expectations for naming conventions and separation of responsibilities in the codebase.
This PR did not properly rebased on the upstream NVIDIA:main branch and needs further alignment with upstream patterns. Review of the branch shows the rebase was on 711718c which is from late Aug 2025.
Testing shows this PR currently ends up in an endless backoff loop when the target configuration is not viable for the llm environment such as when the target model name is not available or a parameter is not supported to pass through to the target model.
Once updated there will need to be additional testing for errors reported by the llm library.
Note also previous feedback in PR #1382 related to accepted_params based on enumerating the parameters for the target are missing from this iteration.
This PR is also missing updates to requirements.txt and pyproject.toml that are required for the generator to load.
garak/generators/llm.py
Outdated
| def _clear_client(self) -> None: | ||
| self.target = None | ||
|
|
||
| def __getstate__(self) -> object: | ||
| self._clear_client() | ||
| return dict(self.__dict__) | ||
|
|
||
| def __setstate__(self, data: dict) -> None: | ||
| self.__dict__.update(data) | ||
| self._load_client() |
There was a problem hiding this comment.
Based on the current upstream main these are no longer needed as they are now handled by the Configurable class updates in #1545.
garak/generators/llm.py
Outdated
| self.__dict__.update(data) | ||
| self._load_client() | ||
|
|
||
| @backoff.on_exception(backoff.fibo, Exception, max_value=70) |
There was a problem hiding this comment.
Backoff on generic Exception is highly susceptible to creating an infinite loop, please update to capture more limited exceptions following the patterns in #1199 to capture valid types especially when based on deferred loading.
garak/generators/llm.py
Outdated
| self._load_client() | ||
|
|
||
| def _load_client(self) -> None: | ||
| import llm as llm_lib | ||
|
|
||
| self.llm = llm_lib |
There was a problem hiding this comment.
Rename this to _load_unsafe() to match the current pattern from main. Also extra_dependencies are loaded by base.Probe.__init__() prior calling _load_unsafe():
| self._load_client() | |
| def _load_client(self) -> None: | |
| import llm as llm_lib | |
| self.llm = llm_lib | |
| self._load_unsafe() | |
| def _load_unsafe(self) -> None: |
Signed-off-by: s-nrajpal <[email protected]>
…-name/self.llm) Signed-off-by: s-nrajpal <[email protected]>
…o test properly Signed-off-by: s-nrajpal <[email protected]>
eb378cf to
29a7f9f
Compare
|
@jmartin-tech These comments make sense. Here were additional issues which I found after code changes and testing. Before fixing the code itself, I properly rebased onto the current REMOTE upstream main
I then tested locally and made sure that
|
garak/generators/llm.py
Outdated
| except self.llm.ModelError as e: | ||
| raise GeneratorBackoffTrigger from e | ||
| except Exception as e: | ||
| backoff_types = ("RateLimitError", "APITimeoutError", "APIConnectionError") |
There was a problem hiding this comment.
If all self.llm.ModelError exceptions should just retry then a single exception handler is viable:
except Exception as e:
backoff_types = (self.llm.ModelError.__name__, "RateLimitError", "APITimeoutError", "APIConnectionError")However looking at llm.ModelError it has a sinlge subtype that should terminate the inference attempts resulting in reason to keep the more focused handler and filter, raising GeneratorBackoffTrigger only when a retry has potential to result in valid completion.
Something like:
except self.llm.NeedsKeyException as e:
logging.error("llm generation failed: %s", repr(e))
outputs.append(None)
except Exception as e:
backoff_types = (self.llm.ModelError, "RateLimitError", "APITimeoutError", "APIConnectionError")For the NeedsKeyException case however I am not sure there is any chance of recovery or progress for the entire run so it may actually be more appropriate to allow that exception to bubble up:
except self.llm.ModelError as e:
raise e
except Exception as e:
backoff_types = (self.llm.ModelError, "RateLimitError", "APITimeoutError", "APIConnectionError")There was a problem hiding this comment.
seperated the two cases so that NeedsKeyException is caught first and returns None. The base model error uses isinstance and raises generator backoff trigger.
garak/generators/llm.py
Outdated
| for _ in range(generations_this_call): | ||
| try: | ||
| response = self.target.prompt( | ||
| text_prompt, system=system_text, **prompt_kwargs | ||
| ) | ||
| outputs.append(Message(response.text())) | ||
| except self.llm.ModelError as e: | ||
| raise GeneratorBackoffTrigger from e | ||
| except Exception as e: | ||
| backoff_types = ("RateLimitError", "APITimeoutError", "APIConnectionError") | ||
| exc_name = type(e).__name__ | ||
| if exc_name in backoff_types: | ||
| raise GeneratorBackoffTrigger from e | ||
| logging.error("llm generation failed: %s", repr(e)) | ||
| outputs.append(None) | ||
| return outputs |
There was a problem hiding this comment.
The backoff action looks to be scoped too broadly. If an exception were to trigger backoff here on anything other than the first iteration of the loop then completed inference outputs would be abandoned and a new set of generations_this_call inference requests would be made.
I would suggest that the code inside this loop is the actually backoff boundary the code should be refactored to account for this. Something like:
@backoff.on_exception(backoff.fibo, GeneratorBackoffTrigger, max_value=70)
def _get_response(text_prompt, system, prompt_kwargs):
try:
response = self.target.prompt(
text_prompt, system=system_text, **prompt_kwargs
)
return Message(response.text()))
except self.llm.NeedsKeyException as e:
logging.error("llm generation failed: %s", repr(e))
return None
except Exception as e:
backoff_types = (self.llm.ModelError.__name__, "RateLimitError", "APITimeoutError", "APIConnectionError")
exc_name = type(e).__name__
if exc_name in backoff_types:
raise GeneratorBackoffTrigger from e
logging.error("llm generation failed: %s", repr(e))
return None for _ in range(generations_this_call):
response = self._get_response(text_prompt, system=system_text, **prompt_kwargs)
outputs.append(response)
return outputsThere was a problem hiding this comment.
I moved the retry to _call_single so that each generation gets its own independent retry.
garak/generators/llm.py
Outdated
| self.name = name | ||
| self._load_config(config_root) | ||
| self.fullname = f"llm:{self.name}" | ||
|
|
||
| super().__init__(self.name, config_root=config_root) |
There was a problem hiding this comment.
Is there some reason to load config early here?
super().__init__(name, config_root=config_root)
self.fullname = f"llm:{self.name}"There was a problem hiding this comment.
There was no reason to do that; now i am passing the name directly to super().init().
garak/generators/llm.py
Outdated
| exc_name = type(e).__name__ | ||
| if exc_name in backoff_types: | ||
| raise GeneratorBackoffTrigger from e |
There was a problem hiding this comment.
Why is this based on string name values? I would expect to be able to use the types themselves similar to how the cohere generator does the handling.
See:
garak/garak/generators/cohere.py
Lines 130 to 142 in 87ba832
Are there multiple types with the same __name__ result that could be raised here but still provide a viable case to retry the request?
There was a problem hiding this comment.
now im using isinstance(e, self.llm.ModelError) , which is the only thing that triggers a retry otherwise we log and return None.
| "top_p": None, | ||
| "stop": [], |
There was a problem hiding this comment.
Consider adding support similar to the OpenAICompatible and LiteLLM generators to suppress or extend options passed to this proxy library.
In particular suppressed_params and extra_params would enable control of additional options to the inference request that may vary depending on the target. These options would need to be incorporated into the logic of _enumerate_model_params().
There was a problem hiding this comment.
i now added supressed_params and extra_params to the default params. It should now match the pattern in OpenAiCompatible and LiteLLM.
…xtra_params to default_params Signed-off-by: Nakul Rajpal <[email protected]>
Fixes issue #1382 → second PR on new branch with updated code
Three files changed:
docs/source/garak.generators.llm.rst: Sphinx autodoc stub for the new LLM generator modulegarak/generators/llm.py: New generator wrapping Simon Willison'sllmlibrary, supporting OpenAI, Claude, Gemini, and other providers via pluginstests/generators/test_llm.py: Unit tests for LLMGenerator covering instantiation, parameter passthrough, error handling, and multiple generationsVerification
pip install llmgarak -m llm -n gpt-4o-minipython -m pytest tests/generators/test_llm.pyllmpackage is not installed--
resolves #463