fix: use parameterized SQL queries in report_digest.py#1645
fix: use parameterized SQL queries in report_digest.py#1645gn00295120 wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
|
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR hardens garak/analyze/report_digest.py against SQL injection from untrusted .report.jsonl inputs by replacing f-string SQL construction with SQLite parameterized queries.
Changes:
- Parameterized the
INSERTinto the in-memoryresultstable in_init_populate_result_db. - Parameterized
SELECTqueries filtering byprobe_groupin_get_group_aggregate_scoreand_get_probe_result_summaries. - Parameterized the detector lookup query filtering by
probe_groupandprobe_classin_get_detectors_info.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _get_probe_result_summaries(cursor, probe_group) -> List[tuple]: | ||
| res = cursor.execute( | ||
| f"select probe_module, probe_class, min(score) as s from results where probe_group='{probe_group}' group by probe_class order by s asc, probe_class asc;" | ||
| "select probe_module, probe_class, min(score) as s from results where probe_group=? group by probe_class order by s asc, probe_class asc;", | ||
| (probe_group,), | ||
| ) |
There was a problem hiding this comment.
Consider adding a targeted regression test that exercises these queries with adversarial probe_group / probe_class / detector values containing quotes or SQL metacharacters (e.g., "x' OR 1=1 --") and asserts build_digest() still succeeds and returns expected results. Current tests run the CLI over known-good assets but don't specifically cover the injection/escaping behavior this change is meant to fix.
|
I have read the DCO Document and I hereby sign the DCO |
93a4d84 to
0b0f1b7
Compare
Add parameterized tests exercising _init_populate_result_db, _get_group_aggregate_score, _get_probe_result_summaries, and _get_detectors_info with adversarial SQL metacharacter payloads (e.g., "x' OR 1=1 --", "'; DROP TABLE results; --"). Addresses Copilot review suggestion on PR NVIDIA#1645. Signed-off-by: Lucas Wang <[email protected]>
|
@gn00295120 using Copilot or another assistant to help create a PR is acceptable and even encouraged if it helps you understand your contribution, however this should be confined to your private fork at this time. Asking the assistant to review and comment on the public PR after being opened upstream places a maintenance burdens on the project for tools that are not used by all contributors. In the future please keep this in mind. |
jmartin-tech
left a comment
There was a problem hiding this comment.
@gn00295120 thank you for the contribution. This looks great and will go thru additional testing.
As an additional note to address the reference to CWE-89 in the description, this is a valid case of an improvement to follow a more secure practice in sql statement generation, the values here are only user controlled when utilizing this class as a standalone utility and the database is ephemeral to the report digest creation process. This simply mean the security evaluation of risk show no exposed vulnerable attack chain existed. This does not diminish the value of this PR in improving the codebase and reducing the possible attack surfaces.
Replace f-string interpolation with parameterized queries using ? placeholders to prevent potential SQL injection from malformed report JSONL data. Addresses CWE-89. Signed-off-by: Lucas Wang <[email protected]>
Add parameterized tests exercising _init_populate_result_db, _get_group_aggregate_score, _get_probe_result_summaries, and _get_detectors_info with adversarial SQL metacharacter payloads (e.g., "x' OR 1=1 --", "'; DROP TABLE results; --"). Addresses Copilot review suggestion on PR NVIDIA#1645. Signed-off-by: Lucas Wang <[email protected]>
fd3f7b5 to
4a1878a
Compare
|
Thank you for the detailed review and clarification — I really appreciate it. |
Summary
garak/analyze/report_digest.pywith parameterized queries using?placeholders_init_populate_result_db,_get_group_aggregate_score,_get_probe_result_summaries,_get_detectors_infoBackground
The affected queries interpolated values derived from report JSONL data directly into SQL strings via f-strings. A malformed or adversarially crafted
.report.jsonlfile could include SQL metacharacters in field values such asprobe,detector, or taxonomy tag names, causing unintended query behavior against the in-memory SQLite database.Parameterized queries pass values as a separate tuple argument to
cursor.execute(), so the SQLite driver handles escaping unconditionally — no string manipulation required.Changes
_init_populate_result_db(line ~131): INSERT with 7?placeholders_get_group_aggregate_score(line ~159): SELECT filtered byprobe_group_get_probe_result_summaries(line ~236): SELECT filtered byprobe_group_get_detectors_info(line ~262): SELECT filtered byprobe_groupandprobe_classNo logic, variable names, return types, or other behavior was changed.
Test plan
pytest tests/— all tests should pass without modification.report.jsonl