Skip to content

feat: VDG core + CFG + inter-procedural taint analysis (PR-01)#600

Merged
shivasurya merged 18 commits intomainfrom
shiva/vdg-querytype-v5-pr01
Mar 12, 2026
Merged

feat: VDG core + CFG + inter-procedural taint analysis (PR-01)#600
shivasurya merged 18 commits intomainfrom
shiva/vdg-querytype-v5-pr01

Conversation

@shivasurya
Copy link
Owner

@shivasurya shivasurya commented Mar 12, 2026

Part 1/8 of V5 QueryType × VDG Integration. VDG core data structures, BFS reachability, CFG builder, inter-procedural taint summaries, merge main.

shivasurya and others added 18 commits March 5, 2026 18:43
Prevents worktree contents from being tracked in the repository.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add Variable Dependency Graph (VDG) for demand-driven dataflow analysis.
The VDG tracks variable definition sites and data dependency edges within
a function, marking taint sources and sanitizers during construction.

Includes 4 tests: direct flow, transitive flow, reassignment kills, and
sanitizer marking.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add TaintDetection struct and FindTaintFlows method that performs BFS-based
reachability analysis from taint sources to sinks. Includes LatestDefAt for
finding the most recent variable definition, findPath for BFS traversal,
and pathContainsSanitizer for filtering flows through sanitized nodes.

7 new tests cover direct flow, transitive flow, flow through calls,
sanitizer kills, unrelated variables, reassignment kills, and multi-hop
transitive propagation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ra-procedural analysis

- Add Statements field to CallGraph struct for storing extracted statements per function
- Store statements in Pass 5 (taint summary generation) for demand-driven reuse
- Replace executeLocal() with VDG-aware version that uses taint.AnalyzeWithVDG()
  when statements are available, with fallback to existing line-number-based detection
- Add extractTargetPatterns() helper to extract unique call target names from matched sites

Co-Authored-By: Claude Opus 4.6 <[email protected]>
… graph dataflow analysis

Add end-to-end integration tests that exercise the full DataflowExecutor
pipeline with VDG-based taint analysis. All 7 test scenarios validate
correct behavior:

1. Direct flow: source -> sink (DETECT)
2. Transitive flow: source -> x -> sink (DETECT)
3. Flow through call: source -> transform(x) -> sink (DETECT)
4. Sanitizer kills taint: source -> sanitize -> sink (NO DETECT)
5. Unrelated variables: source(x), sink(y) (NO DETECT)
6. Reassignment kills taint: x=source(); x="safe"; sink(x) (NO DETECT)
7. Multi-hop transitive: source -> x -> y -> z -> sink (DETECT)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…executor

findMatchingCalls was only checking TargetFQN (e.g., "builtins.eval") against
patterns like "eval", causing 0 sink matches in end-to-end scans. Now tries
Target first, then falls back to TargetFQN, so both short and qualified
patterns work correctly. Also removes debug logging.

End-to-end PoC validation: 7/7 test cases correct (4 true positives, 0 false positives).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Adds BuildCFGFromAST() that walks a tree-sitter Python function node and
produces a ControlFlowGraph with statements organized per basic block.

Handles:
- if/elif/else: conditional block + true/false branches + merge
- for loops: loop header with variable def + body + back edge
- while loops: loop header with condition + body + back edge
- try/except/finally: try block + catch blocks + finally + merge
- with statements: context variable def + body processing
- return: connects to exit block
- Nested control flow (if inside for, etc.)

11 new tests, all existing 14 CFG tests still passing.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Adds AnalyzeWithCFG() that flattens block statements in BFS topological
order and runs VDG taint analysis over the complete set. This captures
taint flows through if/for/while/try/with bodies that flat extraction misses.

4 new tests:
- Taint through if-body: detected (was invisible with flat VDG)
- Taint through for-body: detected (was invisible with flat VDG)
- Partial sanitizer: documents known limitation (needs reaching-definitions)
- Try/except: taint in try body detected, clean catch body not flagged

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add CFGs and CFGBlockStatements maps to CallGraph core types
- Build CFG during Pass 5 (GenerateTaintSummaries) alongside statement extraction
- Prefer CFG-aware analysis in dataflow executor with fallback chain:
  CFG → flat VDG → line-number-based detection
- Enables detection of taint flows through if/for/while/try bodies

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Implements cross-file taint propagation using TaintTransferSummary:
- BuildTaintTransferSummary: Per-function summaries (param→return, param→sink, source→return, sanitizer)
- AnalyzeInterProcedural: Enhances caller VDG with callee transfer summaries
- Indirect sink detection: Detects sinks wrapped in helper functions (e.g., dangerous_eval wrapping eval)
- Proper arg-to-param mapping using CallSite.Arguments instead of stmt.Uses
- Direct return handling: Correctly identifies source/sanitizer in `return os.getenv(...)` patterns
- ReverseEdges fix: Correct candidate function selection for inter-procedural analysis

Scorecard: 6/6 test cases pass (3 DETECT, 3 NOT DETECT)
- Cross-file flow (source→transform→sink): DETECT ✅
- Cross-file with sanitizer: NOT DETECT ✅
- Direct cross-file: DETECT ✅
- Safe source: NOT DETECT ✅
- Safe sink: NOT DETECT ✅
- Multi-hop cross-file (source→identity→transform→sink): DETECT ✅

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace single-pass buildTransferSummaries with an iterative fixpoint
loop that feeds each round's summaries into the next, enabling
multi-level transitive taint propagation (e.g., main -> wrapper ->
get_input -> os.getenv). Converges when no summary changes or after
10 iterations.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add TestDataflowExecutor_Global_MultiLevelSink: 3-level deep sink chain
  (main -> wrap_eval -> dangerous_eval -> eval) with source in main
- Add TestDataflowExecutor_Global_SanitizerInChain: sanitizer wrapper
  function blocks taint flow (source -> sanitizer_wrapper -> eval)
- Fix executeGlobal candidate discovery to transitively expand callers
  via addTransitiveCallers, enabling multi-level sink propagation
- Fix executeGlobal to skip local re-analysis for functions already
  analyzed inter-procedurally, preventing false positives when
  sanitizer wrappers are not recognized by local-only analysis

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a function does `return callee()` (def=""), the VDG creates no node
so EnhanceVDGWithCalleeSummaries can't mark it. Add special-case checks:
- If callee's summary has ReturnTaintedBySource, propagate to caller
- If callee's summary has IsSanitizer, propagate to caller

This fixes 3-level source chains like wrapper() -> get_user_input() ->
os.getenv() where wrapper has `return get_user_input()` with no
intermediate variable.

PoC scorecard: 10/10 (was 8/10 before this fix)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Previously, FindTaintFlows and BuildTaintTransferSummary only
considered bare call statements (type=call, def="") as sinks.
This missed sinks in return statements (e.g., return redirect(url))
and assignment statements (e.g., obj = pickle.loads(data)).

Also extract bare name from dotted call targets in extractTargetPatterns
(e.g., cursor.execute → also adds "execute") to bridge the naming gap
between CallSite.Target and Statement.CallTarget.

Validated on Label Studio (659 Python files, 1646 functions):
- 8/8 injected test cases correctly classified
- 15+ real inter-procedural flows detected across the codebase
- 29/29 test packages pass, 0 regressions

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- dataflow_executor.go: took main's polymorphic resolveMatchers as base,
  appended VDG utility functions (extractTargetPatterns, buildTransferSummaries,
  findMatchingCalls, etc.) as standalone coexisting code
- dataflow_executor_test.go: took main's tests as base, appended VDG-specific
  unit tests with proper json.RawMessage IR construction
- dataflow_executor_vdg_test.go: migrated from []CallMatcherIR to
  toRawMessages()/emptyRawMessages() for json.RawMessage IR types;
  skipped Cases 5 (unrelated vars) and 6 (reassignment kills) pending
  VDG variable tracking in PR-04

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 10
Rules 38

Powered by Code Pathfinder

@shivasurya shivasurya self-assigned this Mar 12, 2026
@shivasurya shivasurya changed the title chore: add .worktrees/ to gitignore feat: VDG core + CFG + inter-procedural taint analysis (PR-01) Mar 12, 2026
@shivasurya shivasurya marked this pull request as ready for review March 12, 2026 04:21
@safedep
Copy link

safedep bot commented Mar 12, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

Copy link
Owner Author

shivasurya commented Mar 12, 2026

Merge activity

  • Mar 12, 4:25 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Mar 12, 4:25 AM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya merged commit 12839a4 into main Mar 12, 2026
6 of 10 checks passed
@shivasurya shivasurya deleted the shiva/vdg-querytype-v5-pr01 branch March 12, 2026 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant