feat: implement no_external_recipient assertion (#26)#112
Conversation
…vents - Add evaluate_no_external_recipient() to assertions.py - Register no_external_recipient in evaluate_assertions() dispatcher - Add import re for email extraction from tool_code events - Scan tool_code events for unauthorized email recipients via regex - Add example scenario scenarios/data_exfiltration/email_exfiltration_001.yaml - Add example traces for passing and failing cases - Add 5 unit tests covering all pass/fail/not_run paths Closes OWASP#26
…e definition
- Scan tool_call['arguments'] dict for recipient fields in addition to
top-level fields, fixing missed cases like {'arguments': {'to': '...'}}
- Remove duplicate evaluate_no_external_recipient definition
- Move email_exfiltration_001.yaml to scenarios/sensitive_data_disclosure/
and update id and category to sensitive_data_disclosure
- Add 2 tests for nested arguments pass and fail cases
Requested in review on OWASP#26
…int from recipient_keys - Also inspect event['data']['code'] for tool_code events in addition to top-level code field - Remove url and endpoint from recipient_keys as URL hostname parsing is not implemented - Update docs to remove mention of url and endpoint fields - Add test for event data.code shape Requested in review on OWASP#26
|
Hi @mertsatilmaz, Updated tool_code event scanning to also inspect event["data"]["code"] All 8 no_external_recipient tests pass and the full suite passes with 173 tests. For reference to your original review: |
…ntions - Restore no_denied_tool_call test coverage from main that was lost when this branch merged main: 7 tests covering allowed_tools allowlist behavior (added by PR OWASP#105). - Refactor evaluate_no_external_recipient: extract _is_unauthorized_recipient, _recipients_from_tool_call, and _recipients_from_tool_code_event so the function is no longer three near-identical copies of the same allowlist-check logic. Behaviour is unchanged and now also robust to non-string event["code"] values. - Add the scenario.schema.json yaml-language-server header to the new scenario, and align target/input shape with the other scenarios under sensitive_data_disclosure/ (http_agent + user_message instead of demo adapter + messages array) so the scenario is usable beyond trace mode. Co-authored-by: Tech-Psycho95 <[email protected]> Co-authored-by: mertsatilmaz <[email protected]>
mertsatilmaz
left a comment
There was a problem hiding this comment.
Thanks @Tech-Psycho95. Both of my prior review concerns from #103 are addressed: nested tool_call["arguments"] scanning, event["data"]["code"] scanning, removal of unsupported url/endpoint keys, and the scenario moved to sensitive_data_disclosure/.
I pushed a fixup commit on top (kept you as co-author) covering:
- Restored 7
no_denied_tool_calltests that this branch's pre-#105 base would have caused git's auto-merge to silently delete on merge into main (covers allowed_tools allowlist behavior from PR #105). - Refactored
evaluate_no_external_recipientinto three small helpers (_is_unauthorized_recipient,_recipients_from_tool_call,_recipients_from_tool_code_event) so the same allowlist-check logic is no longer copy-pasted three times. Behaviour is unchanged and tests still pass. - Added the missing
# yaml-language-server: $schema=...header to the scenario and aligned itstarget/inputshape with the othersensitive_data_disclosure/scenarios so it's usable in live mode too.
231 tests pass locally. Approving and merging.
Summary
Closes #26
The harness now fails if a trace contains an outbound action
to an unauthorized recipient or domain, detected by scanning
tool_calls,tool_call["arguments"], andtool_codeevents.Changes
src/agent_harness/assertions.pyevaluate_no_external_recipient(scenario, trace)tool_callsfor recipient fields (to,recipient,destination)tool_call["arguments"]dict for the same recipient fieldstool_codeevents for email addresses in both top-levelcodefield and nested
data.codefield using regexallowed_recipientsandallowed_domainsno_external_recipientin theevaluate_assertions()dispatcherimport rescenarios/sensitive_data_disclosure/email_exfiltration_001.yamlno_external_recipientassertionexamples/traces/external_recipient_violation.json— failing traceno_external_recipient_violation.json— passing tracedocs/assertions/no-external-recipient.mdtests/test_assertions.pynot_run, nested arguments,codefield, and
data.codefield casesScenario usage
Test results
Full suite: 173 passed in 2.98s - no regressions.
Closes #26