⚡️ Speed up method JavaAssertTransformer._detect_framework by 236% in PR #1295 (feat/java-remove-asserts-transformer)#1326
Closed
codeflash-ai[bot] wants to merge 1 commit intofeat/java-remove-asserts-transformerfrom
Conversation
The optimized code achieves a **235% speedup (15.6ms → 4.64ms)** by replacing the expensive tree-sitter-based parsing with a lightweight regex-based approach for scanning Java imports. ## Key Optimization: Regex-Based Import Scanning The original implementation called `self.parse(source_bytes)` in `find_imports()`, which invoked the full tree-sitter parser to build an Abstract Syntax Tree (AST). Line profiler shows this `parse()` call consumed **37.5%** of the total runtime in `find_imports()` alone, and the subsequent `_extract_import_info()` calls consumed another **53.5%**. The optimized version introduces a precompiled regex pattern (`_IMPORT_RE`) that matches Java import statements directly from text lines. The new `_scan_import_lines()` method: - Processes source line-by-line without building an AST - Handles single-line comments (`//`) and block comments (`/* */`) to avoid false matches - Extracts the same logical information (import path, static flag, wildcard, line numbers) as the original - Avoids the overhead of tree traversal and node extraction ## Performance Impact on Framework Detection The `_detect_framework()` method shows the real-world benefit. Originally spending **90.1%** of its time calling `find_imports()`, the optimized version reduces this to **80.5%** - but the absolute time drops dramatically (34.7ms → 18.9ms for that call alone). Additionally, the single-pass detection logic now sets flags (`found_junit5`, `found_junit4`) instead of making two separate passes through the imports list. This eliminates redundant iterations when JUnit is present alongside specific assertion libraries. ## Test Results Analysis The annotated tests confirm consistent speedups across all scenarios: - **Simple cases** (few imports): 158-325% faster (e.g., empty source: 7.54μs → 1.77μs) - **Medium complexity** (10-20 imports): 165-224% faster (typical test files) - **Large import lists** (100-1000 imports): 243-291% faster (e.g., 1000 imports: 6.52ms → 1.84ms) - **Worst case with large file**: 295% faster (1.54ms → 389μs for file with 100 methods) The regex approach scales better than tree-sitter for import-heavy files because it processes imports in O(n) time with minimal per-line overhead, whereas tree-sitter builds a complete syntax tree regardless of whether you only need import information. ## Workload Suitability This optimization particularly benefits: - Build systems that scan many test files to determine framework usage - IDEs performing quick import analysis for code completion or refactoring - CI/CD pipelines that analyze test structure across large codebases - Any workflow that repeatedly calls `_detect_framework()` on test files (as shown by the 100-iteration test: 2.06ms → 625μs for repeated calls) The optimization preserves all original behavior including comment handling, wildcard detection, and framework priority rules, making it a drop-in replacement with purely runtime benefits.
3 tasks
Contributor
|
dont want to not use tree-sitter parser |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1295
If you approve this dependent PR, these changes will be merged into the original PR branch
feat/java-remove-asserts-transformer.📄 236% (2.36x) speedup for
JavaAssertTransformer._detect_frameworkincodeflash/languages/java/remove_asserts.py⏱️ Runtime :
15.6 milliseconds→4.64 milliseconds(best of135runs)📝 Explanation and details
The optimized code achieves a 235% speedup (15.6ms → 4.64ms) by replacing the expensive tree-sitter-based parsing with a lightweight regex-based approach for scanning Java imports.
Key Optimization: Regex-Based Import Scanning
The original implementation called
self.parse(source_bytes)infind_imports(), which invoked the full tree-sitter parser to build an Abstract Syntax Tree (AST). Line profiler shows thisparse()call consumed 37.5% of the total runtime infind_imports()alone, and the subsequent_extract_import_info()calls consumed another 53.5%.The optimized version introduces a precompiled regex pattern (
_IMPORT_RE) that matches Java import statements directly from text lines. The new_scan_import_lines()method://) and block comments (/* */) to avoid false matchesPerformance Impact on Framework Detection
The
_detect_framework()method shows the real-world benefit. Originally spending 90.1% of its time callingfind_imports(), the optimized version reduces this to 80.5% - but the absolute time drops dramatically (34.7ms → 18.9ms for that call alone).Additionally, the single-pass detection logic now sets flags (
found_junit5,found_junit4) instead of making two separate passes through the imports list. This eliminates redundant iterations when JUnit is present alongside specific assertion libraries.Test Results Analysis
The annotated tests confirm consistent speedups across all scenarios:
The regex approach scales better than tree-sitter for import-heavy files because it processes imports in O(n) time with minimal per-line overhead, whereas tree-sitter builds a complete syntax tree regardless of whether you only need import information.
Workload Suitability
This optimization particularly benefits:
_detect_framework()on test files (as shown by the 100-iteration test: 2.06ms → 625μs for repeated calls)The optimization preserves all original behavior including comment handling, wildcard detection, and framework priority rules, making it a drop-in replacement with purely runtime benefits.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1295-2026-02-03T21.33.14and push.