⚡️ Speed up method ReferenceFinder._find_references_in_file by 313% in PR #1335 (gpu-flag)#1356
Open
codeflash-ai[bot] wants to merge 5 commits intogpu-flagfrom
Open
⚡️ Speed up method ReferenceFinder._find_references_in_file by 313% in PR #1335 (gpu-flag)#1356codeflash-ai[bot] wants to merge 5 commits intogpu-flagfrom
ReferenceFinder._find_references_in_file by 313% in PR #1335 (gpu-flag)#1356codeflash-ai[bot] wants to merge 5 commits intogpu-flagfrom
Conversation
Add a `gpu` parameter to instrument tests with torch.cuda.Event timing instead of time.perf_counter_ns() for measuring GPU kernel execution time. Falls back to CPU timing when CUDA is not available/initialized. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix unused variables, single-item membership tests, unnecessary lambdas, and ternary expressions that can use `or` operator. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This optimization achieves a **313% speedup** (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are:
**What was optimized:**
1. **Node text caching**: Added `_node_text_cache` and `_node_bytes_cache` dictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID
2. **Lazy decoding**: Introduced `_get_node_text()` and `_get_node_bytes()` helper methods that cache results on first access
3. **Byte-level comparisons**: Changed identifier matching from string equality (`name == search_name`) to byte equality (`node_bytes == search_bytes`), avoiding UTF-8 decoding unless necessary
4. **Pre-encoded search term**: The `search_name` is encoded once per file as `search_bytes` rather than repeatedly during comparisons
**Why this is faster:**
The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows `_find_identifier_references` spent 52.1% of time in `child_by_field_name("function")` and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match.
**Impact:**
- The line profiler shows `_find_references_in_file` total time dropped from 21.5ms to 6.6ms (69% reduction)
- The recursive `_find_identifier_references` becomes dramatically faster by avoiding repeated decode operations on the same nodes
- Memory overhead is minimal—caches are cleared per file and only store node IDs and their decoded text
- This optimization particularly benefits files with many function calls or deep AST nesting where the same parent/child nodes are accessed repeatedly
The caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1335
If you approve this dependent PR, these changes will be merged into the original PR branch
gpu-flag.📄 313% (3.13x) speedup for
ReferenceFinder._find_references_in_fileincodeflash/languages/javascript/find_references.py⏱️ Runtime :
5.05 milliseconds→1.22 milliseconds(best of8runs)📝 Explanation and details
This optimization achieves a 313% speedup (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are:
What was optimized:
_node_text_cacheand_node_bytes_cachedictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID_get_node_text()and_get_node_bytes()helper methods that cache results on first accessname == search_name) to byte equality (node_bytes == search_bytes), avoiding UTF-8 decoding unless necessarysearch_nameis encoded once per file assearch_bytesrather than repeatedly during comparisonsWhy this is faster:
The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows
_find_identifier_referencesspent 52.1% of time inchild_by_field_name("function")and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match.Impact:
_find_references_in_filetotal time dropped from 21.5ms to 6.6ms (69% reduction)_find_identifier_referencesbecomes dramatically faster by avoiding repeated decode operations on the same nodesThe caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1335-2026-02-04T01.22.32and push.