Skip to content

⚡️ Speed up method JavaAnalyzer.find_fields by 11% in PR #1199 (omni-java)#1300

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T10.35.55
Closed

⚡️ Speed up method JavaAnalyzer.find_fields by 11% in PR #1199 (omni-java)#1300
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T10.35.55

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 11% (0.11x) speedup for JavaAnalyzer.find_fields in codeflash/languages/java/parser.py

⏱️ Runtime : 12.0 milliseconds 10.8 milliseconds (best of 140 runs)

📝 Explanation and details

The optimization achieves an 11% runtime improvement (12.0ms → 10.8ms) by reducing overhead in the recursive tree-walking algorithm that processes Java AST nodes.

Key Optimizations:

  1. Cached node.type lookup: The original code accessed node.type multiple times per recursion (3 times in most iterations). By caching it once as node_type, we eliminate repeated attribute lookups across ~10,000 recursive calls, saving ~3.5ms in the profiler data.

  2. Early return pattern for class declarations: When encountering a class declaration, the optimized version processes its children immediately and returns, avoiding the generic child iteration loop below. This eliminates ~876 redundant recursive calls (visible in the profiler: 19,552 calls reduced to 17,800 + 1,752), saving ~2ms in recursion overhead.

  3. Simplified control flow: Removed the new_class variable and conditional assignment (new_class if node.type == "class_declaration" else current_class), which was being evaluated 9,776 times. The early return pattern makes this unnecessary.

Performance Profile:

  • Line profiler shows _walk_tree_for_fields improved from 59.6ms to 55.8ms (6.4% faster)
  • Reduction in recursive call overhead: self-time decreased from ~15.7ms to ~14.2ms
  • node.type comparisons improved from 7.1ms to 5.9ms (17% faster)

Test Results:
All test cases show consistent 6-13% speedups:

  • Large-scale tests (100+ fields/classes) see the biggest gains: 10.5-13.3% faster
  • Deep nesting benefits significantly: 12.6% faster with 20 nested levels
  • Small test cases still benefit: 6-10% faster

This optimization is particularly valuable for analyzing large Java codebases with many classes and fields, where the recursive tree traversal dominates runtime.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 119 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest
from codeflash.languages.java.parser import JavaAnalyzer, JavaFieldInfo

class TestJavaAnalyzerFindFieldsBasic:
    """Basic test cases for JavaAnalyzer.find_fields function."""

    def test_empty_source_code(self):
        """Test that empty source code returns an empty list of fields."""
        analyzer = JavaAnalyzer()
        codeflash_output = analyzer.find_fields(""); result = codeflash_output # 8.63μs -> 8.50μs (1.53% faster)

    def test_source_without_fields(self):
        """Test source code with class but no field declarations."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public void myMethod() {
                int x = 5;
            }
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 44.8μs -> 40.6μs (10.5% faster)

    def test_single_public_field(self):
        """Test extraction of a single public field."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int myField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 39.4μs -> 37.0μs (6.52% faster)

    def test_single_private_field(self):
        """Test extraction of a single private field."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private String name;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 37.5μs -> 35.0μs (7.09% faster)

    def test_field_with_modifiers(self):
        """Test extraction of field with multiple modifiers."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private static final int MAX_VALUE = 100;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 44.9μs -> 41.9μs (7.25% faster)

    def test_multiple_fields_different_types(self):
        """Test extraction of multiple fields with different types."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int intField;
            public String stringField;
            private boolean boolField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 62.9μs -> 58.7μs (7.11% faster)
        names = [field.name for field in result]

    def test_multiple_fields_single_declaration(self):
        """Test extraction of multiple fields declared in one statement."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int a, b, c;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 48.1μs -> 44.1μs (9.05% faster)
        names = [field.name for field in result]
        # All should have the same type
        for field in result:
            pass

    def test_fields_in_nested_class(self):
        """Test extraction of fields from nested classes."""
        analyzer = JavaAnalyzer()
        source = """
        public class Outer {
            public int outerField;
            
            public class Inner {
                public String innerField;
            }
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 58.1μs -> 53.8μs (7.85% faster)
        names = [field.name for field in result]

    def test_protected_field(self):
        """Test extraction of protected field."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            protected double protectedField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 38.0μs -> 35.3μs (7.43% faster)

    def test_field_with_complex_type(self):
        """Test extraction of field with generic/complex type."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private java.util.List<String> myList;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 46.0μs -> 42.4μs (8.53% faster)

    def test_class_name_filtering(self):
        """Test filtering fields by class name."""
        analyzer = JavaAnalyzer()
        source = """
        public class ClassA {
            public int fieldA;
        }
        
        public class ClassB {
            public int fieldB;
        }
        """
        codeflash_output = analyzer.find_fields(source, class_name="ClassA"); result = codeflash_output # 53.6μs -> 49.6μs (7.97% faster)

class TestJavaAnalyzerFindFieldsEdgeCases:
    """Edge case test cases for JavaAnalyzer.find_fields function."""

    def test_whitespace_in_field_names(self):
        """Test fields with various whitespace in declaration."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public    int    myField   ;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 37.0μs -> 34.2μs (8.02% faster)

    def test_field_with_initialization(self):
        """Test field with initialization expression."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int myField = 42;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 40.0μs -> 36.7μs (8.99% faster)

    def test_field_with_array_type(self):
        """Test field with array type."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int[] arrayField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 41.0μs -> 37.7μs (8.69% faster)

    def test_field_with_multi_dimensional_array(self):
        """Test field with multi-dimensional array type."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public String[][] matrix;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 41.9μs -> 38.5μs (8.71% faster)

    def test_field_with_primitive_types(self):
        """Test all primitive types."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public byte b;
            public short s;
            public int i;
            public long l;
            public float f;
            public double d;
            public boolean z;
            public char c;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 114μs -> 104μs (9.66% faster)
        type_names = [field.type_name for field in result]

    def test_field_line_numbers(self):
        """Test that field line numbers are correctly captured."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int field1;
            public String field2;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 49.3μs -> 46.1μs (6.84% faster)
        for field in result:
            pass

    def test_field_source_text(self):
        """Test that field source text is captured."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public int myField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 37.2μs -> 34.6μs (7.57% faster)

    def test_no_fields_in_interface(self):
        """Test parsing of interface with no field declarations."""
        analyzer = JavaAnalyzer()
        source = """
        public interface MyInterface {
            void myMethod();
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 27.8μs -> 25.5μs (8.86% faster)

    def test_field_in_static_context(self):
        """Test static field extraction."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public static int staticField = 10;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 42.9μs -> 40.1μs (7.15% faster)

    def test_final_field_extraction(self):
        """Test final field extraction."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public final String CONSTANT = "value";
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 44.0μs -> 41.0μs (7.28% faster)

    def test_field_with_generic_type_single_param(self):
        """Test field with single generic parameter."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private List<String> items;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 41.3μs -> 38.6μs (6.82% faster)

    def test_field_with_generic_type_multiple_params(self):
        """Test field with multiple generic parameters."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private Map<String, Integer> mapping;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 43.4μs -> 40.6μs (6.98% faster)

    def test_field_with_wildcard_generic(self):
        """Test field with wildcard generic type."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            private List<?> unknownList;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 41.2μs -> 37.8μs (8.95% faster)

    def test_class_name_filter_no_match(self):
        """Test filtering by class name when no match exists."""
        analyzer = JavaAnalyzer()
        source = """
        public class ClassA {
            public int fieldA;
        }
        """
        codeflash_output = analyzer.find_fields(source, class_name="NonExistentClass"); result = codeflash_output # 29.9μs -> 27.6μs (8.39% faster)

    def test_class_name_filter_nested_class(self):
        """Test filtering by nested class name."""
        analyzer = JavaAnalyzer()
        source = """
        public class Outer {
            public class Inner {
                public int innerField;
            }
        }
        """
        codeflash_output = analyzer.find_fields(source, class_name="Inner"); result = codeflash_output # 45.7μs -> 42.0μs (8.70% faster)

    def test_multiple_classes_with_same_field_names(self):
        """Test extraction when multiple classes have fields with same names."""
        analyzer = JavaAnalyzer()
        source = """
        public class ClassA {
            public int field;
        }
        
        public class ClassB {
            public String field;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 57.6μs -> 53.2μs (8.33% faster)

    def test_field_with_annotation(self):
        """Test field with annotation."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            @Deprecated
            public int oldField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 41.6μs -> 38.7μs (7.50% faster)

    def test_multiple_annotations_on_field(self):
        """Test field with multiple annotations."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            @Override
            @Deprecated
            public int field;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 44.5μs -> 40.8μs (9.05% faster)

    def test_field_without_modifiers(self):
        """Test field with package-private (no modifier) access."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            int packagePrivateField;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 36.0μs -> 33.1μs (8.96% faster)

    def test_field_in_anonymous_class(self):
        """Test field extraction from anonymous class."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            Object obj = new Object() {
                public int anonField;
            };
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 57.1μs -> 53.1μs (7.65% faster)

class TestJavaAnalyzerFindFieldsLargeScale:
    """Large scale test cases for JavaAnalyzer.find_fields function."""

    def test_large_number_of_fields_in_class(self):
        """Test extraction of many fields from a single class."""
        analyzer = JavaAnalyzer()
        source = "public class MyClass {\n"
        field_count = 100
        for i in range(field_count):
            source += f"    public int field{i};\n"
        source += "}\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 1.03ms -> 918μs (12.0% faster)
        
        # Verify all field names are present
        names = [field.name for field in result]
        for i in range(field_count):
            pass

    def test_large_number_of_classes(self):
        """Test extraction from many classes."""
        analyzer = JavaAnalyzer()
        source = ""
        class_count = 50
        for c in range(class_count):
            source += f"""
        public class Class{c} {{
            public int field{c};
        }}
        """
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 865μs -> 773μs (12.0% faster)
        
        # Verify all fields are present
        names = [field.name for field in result]
        for c in range(class_count):
            pass

    def test_deeply_nested_classes(self):
        """Test extraction from deeply nested class structures."""
        analyzer = JavaAnalyzer()
        source = "public class Level0 {\n"
        source += "    public int level0Field;\n"
        
        for level in range(1, 20):
            source += f"""
    public class Level{level} {{
        public int level{level}Field;
"""
        
        source += "}" * 20 + "\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 352μs -> 312μs (12.6% faster)
        
        names = [field.name for field in result]

    def test_many_modifiers_combinations(self):
        """Test various combinations of modifiers."""
        analyzer = JavaAnalyzer()
        source = "public class MyClass {\n"
        
        # Generate various modifier combinations
        modifiers_list = [
            "public",
            "private",
            "protected",
            "public static",
            "private final",
            "protected static final",
        ]
        
        for idx, mods in enumerate(modifiers_list * 10):
            source += f"    {mods} int field{idx};\n"
        
        source += "}\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 685μs -> 617μs (10.9% faster)
        
        # Verify modifier extraction
        static_count = sum(1 for field in result if field.is_static)
        final_count = sum(1 for field in result if field.is_final)

    def test_large_source_file(self):
        """Test processing of a large source file."""
        analyzer = JavaAnalyzer()
        # Create a source with multiple classes and fields
        source = ""
        total_fields = 0
        
        for class_idx in range(10):
            source += f"public class Class{class_idx} {{\n"
            
            # Add multiple fields to each class
            for field_idx in range(30):
                source += f"    public int field{class_idx}_{field_idx};\n"
                total_fields += 1
            
            source += "}\n\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 3.16ms -> 2.85ms (10.5% faster)

    def test_all_primitive_types_multiple_times(self):
        """Test extraction of all primitive types repeated many times."""
        analyzer = JavaAnalyzer()
        source = "public class MyClass {\n"
        
        primitive_types = ["byte", "short", "int", "long", "float", "double", "boolean", "char"]
        field_count = 0
        
        for idx in range(80):
            ptype = primitive_types[idx % len(primitive_types)]
            source += f"    public {ptype} field{idx};\n"
            field_count += 1
        
        source += "}\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 827μs -> 747μs (10.6% faster)
        
        # Verify all types are represented
        type_names = set(field.type_name for field in result)
        for ptype in primitive_types:
            pass

    def test_multiple_declarations_in_single_statement_many(self):
        """Test multiple fields declared in single statements."""
        analyzer = JavaAnalyzer()
        source = "public class MyClass {\n"
        
        total_fields = 0
        for stmt_idx in range(50):
            field_count = (stmt_idx % 5) + 1  # 1 to 5 fields per statement
            fields = ", ".join([f"field{stmt_idx}_{i}" for i in range(field_count)])
            source += f"    public int {fields};\n"
            total_fields += field_count
        
        source += "}\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 993μs -> 892μs (11.3% faster)

    def test_large_complex_generic_types(self):
        """Test extraction of many complex generic field types."""
        analyzer = JavaAnalyzer()
        source = "public class MyClass {\n"
        
        generic_types = [
            "List<String>",
            "Map<String, Integer>",
            "Set<Long>",
            "Queue<Double>",
            "LinkedHashMap<String, Object>",
            "ConcurrentHashMap<Integer, String>",
        ]
        
        field_count = 0
        for idx in range(60):
            gtype = generic_types[idx % len(generic_types)]
            source += f"    private {gtype} field{idx};\n"
            field_count += 1
        
        source += "}\n"
        
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 864μs -> 770μs (12.3% faster)
        
        # Verify generic types are parsed
        type_names = [field.type_name for field in result]

    def test_performance_with_many_fields_and_filters(self):
        """Test performance when filtering by class name with many classes."""
        analyzer = JavaAnalyzer()
        source = ""
        target_class = "TargetClass"
        target_fields = 50
        
        # Create multiple classes
        for class_idx in range(100):
            if class_idx == 50:
                # Create target class with many fields
                source += f"public class {target_class} {{\n"
                for field_idx in range(target_fields):
                    source += f"    public int targetField{field_idx};\n"
                source += "}\n\n"
            else:
                # Create other classes with fewer fields
                source += f"public class OtherClass{class_idx} {{\n"
                source += f"    public int otherField{class_idx};\n"
                source += "}\n\n"
        
        codeflash_output = analyzer.find_fields(source, class_name=target_class); result = codeflash_output # 1.78ms -> 1.57ms (13.3% faster)

    def test_java_field_info_object_properties(self):
        """Test that JavaFieldInfo objects have all expected properties."""
        analyzer = JavaAnalyzer()
        source = """
        public class MyClass {
            public final int myField = 42;
        }
        """
        codeflash_output = analyzer.find_fields(source); result = codeflash_output # 44.8μs -> 41.6μs (7.66% faster)
        
        field = result[0]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-03T10.35.55 and push.

Codeflash Static Badge

The optimization achieves an **11% runtime improvement** (12.0ms → 10.8ms) by reducing overhead in the recursive tree-walking algorithm that processes Java AST nodes.

**Key Optimizations:**

1. **Cached `node.type` lookup**: The original code accessed `node.type` multiple times per recursion (3 times in most iterations). By caching it once as `node_type`, we eliminate repeated attribute lookups across ~10,000 recursive calls, saving ~3.5ms in the profiler data.

2. **Early return pattern for class declarations**: When encountering a class declaration, the optimized version processes its children immediately and returns, avoiding the generic child iteration loop below. This eliminates ~876 redundant recursive calls (visible in the profiler: 19,552 calls reduced to 17,800 + 1,752), saving ~2ms in recursion overhead.

3. **Simplified control flow**: Removed the `new_class` variable and conditional assignment (`new_class if node.type == "class_declaration" else current_class`), which was being evaluated 9,776 times. The early return pattern makes this unnecessary.

**Performance Profile:**
- Line profiler shows `_walk_tree_for_fields` improved from 59.6ms to 55.8ms (6.4% faster)
- Reduction in recursive call overhead: self-time decreased from ~15.7ms to ~14.2ms
- `node.type` comparisons improved from 7.1ms to 5.9ms (17% faster)

**Test Results:**
All test cases show consistent 6-13% speedups:
- Large-scale tests (100+ fields/classes) see the biggest gains: 10.5-13.3% faster
- Deep nesting benefits significantly: 12.6% faster with 20 nested levels
- Small test cases still benefit: 6-10% faster

This optimization is particularly valuable for analyzing large Java codebases with many classes and fields, where the recursive tree traversal dominates runtime.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 3, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 3, 2026
@KRRT7
Copy link
Collaborator

KRRT7 commented Feb 19, 2026

Closing stale bot PR.

@KRRT7 KRRT7 closed this Feb 19, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1199-2026-02-03T10.35.55 branch February 19, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant