Skip to content

⚡️ Speed up function _parse_optimization_source by 31% in PR #1199 (omni-java)#1322

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T19.29.36
Closed

⚡️ Speed up function _parse_optimization_source by 31% in PR #1199 (omni-java)#1322
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T19.29.36

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 31% (0.31x) speedup for _parse_optimization_source in codeflash/languages/java/replacement.py

⏱️ Runtime : 14.7 milliseconds 11.2 milliseconds (best of 231 runs)

📝 Explanation and details

The optimization achieves a 30% runtime improvement (14.7ms → 11.2ms) by eliminating redundant string operations in the _parse_optimization_source function, particularly when processing Java source code with multiple methods and fields.

Key Changes:

  1. Single-pass line splitting: The original code called new_source.splitlines(keepends=True) once for the target method and again for each helper method. With many helper methods, this became O(n²) behavior. The optimized version splits the source once and reuses the lines array for all method extractions, reducing this to O(n).

  2. Combined method extraction loop: Instead of two separate loops (one to find the target method, another to extract helpers), the optimization uses a single loop that extracts both the target and helpers in one pass. This halves the iteration overhead and the number of string join operations.

  3. Type-checking guard in JavaAnalyzer: Added isinstance(source, str) checks before encoding in find_methods, find_classes, and find_fields. While all current callers pass strings (making this a no-op), it prevents potential double-encoding if the code evolves to accept pre-encoded bytes.

Why This Improves Performance:

The dominant cost in the original code was string processing: splitting a potentially large source file into lines multiple times (once per helper method). The line profiler shows that in the original version, new_source.splitlines(keepends=True) consumed ~8.8% of total function time (3.6ms across 282 calls for helpers, plus additional calls for the target). By performing this operation just once, the optimization eliminates this repeated work.

Test Case Performance:

The improvement is most dramatic for code with many methods. The test_large_scale_many_helper_methods_and_fields_performance_and_correctness test shows a 3969% speedup (3.17ms → 78.0μs) when processing 200 helper methods and 200 fields. Even modest test cases with a few methods show 5-14% improvements, confirming the optimization benefits both common and edge cases.

Impact on Workloads:

This optimization directly benefits Java code analysis workflows that parse optimization suggestions containing multiple methods and fields—a common scenario when AI-generated optimizations include helper methods or require additional class members. The single-pass approach scales linearly with the number of methods, whereas the original approach degraded quadratically.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 6 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from __future__ import \
    annotations  # postpone evaluation of annotations to allow forward references

from dataclasses import \
    dataclass  # to represent the ParsedOptimization return type
from types import \
    SimpleNamespace  # convenient lightweight container for attributes
from typing import List  # type hinting used in tests

import pytest  # used for our unit tests
from codeflash.languages.java.replacement import _parse_optimization_source

def test_no_classes_returns_full_source_and_no_members():
    # Scenario: new_source contains only a method (no enclosing class).
    # Expectation: Since analyzer.find_classes returns an empty list,
    # the function should return the entire new_source as the target_method_source
    # and no new fields or helper methods should be returned.
    new_source = "public void optimized() { /* body */ }\n"
    target_name = "optimized"

    # Analyzer with find_classes returning empty list; other methods should not be called.
    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: []  # indicates no class wrapper
    # If these were called unexpectedly, raise to catch the bug.
    analyzer.find_methods = lambda src: (_ for _ in ()).throw(AssertionError("find_methods should not be called"))
    analyzer.find_fields = lambda src: (_ for _ in ()).throw(AssertionError("find_fields should not be called"))

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 1.72μs -> 1.72μs (0.000% faster)

def test_class_with_target_method_including_javadoc_helpers_and_fields():
    # Scenario: new_source is a class containing:
    # - A Javadoc block + target method
    # - A helper method
    # - A field with source_text
    #
    # The analyzer will report classes present and provide method/field metadata.
    # Expectation: target_method_source must include javadoc + method lines;
    # helper method should be returned as a separate element; field source_text should be captured.
    lines = [
        "/** Class Javadoc */\n",     # 1
        "public class C {\n",        # 2
        "  /** target javadoc */\n", # 3
        "  public void target() {}\n",# 4
        "  public void helper() {}\n",# 5
        "  static int F = 1;\n",     # 6
        "}\n",                       # 7
    ]
    new_source = "".join(lines)
    target_name = "target"

    # Build method metadata as the analyzer would provide.
    target_method = SimpleNamespace(
        name="target",
        javadoc_start_line=3,  # include the Javadoc (line 3)
        start_line=4,
        end_line=4,  # end_line should include the method line; extraction uses slice up to end_line
    )
    # Helper method single-line (line 5)
    helper_method = SimpleNamespace(
        name="helper",
        javadoc_start_line=None,
        start_line=5,
        end_line=5,
    )
    # Field metadata as the analyzer would provide.
    field = SimpleNamespace(source_text=lines[5])  # "  static int F = 1;\n"

    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: [object()]  # non-empty indicates presence of class
    analyzer.find_methods = lambda src: [target_method, helper_method]
    analyzer.find_fields = lambda src: [field]

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 6.39μs -> 5.61μs (13.9% faster)

    # Target method source should include the javadoc line (index 2) and method line (index 3).
    expected_target = "".join(lines[2:4])  # lines[2] and lines[3]

    # Helper methods list should contain the helper method source.
    expected_helper = "".join(lines[4:5])  # line index 4 only

def test_when_target_missing_all_methods_become_helpers_and_target_remains_full_source():
    # Scenario: The provided class does not contain a method with the target name.
    # Expectation:
    # - Since target_method not found, target_method_source should remain the original new_source.
    # - All returned methods should be treated as helper methods (i.e., included in new_helper_methods).
    lines = [
        "public class C {\n",         # 1
        "  public void a() {}\n",     # 2
        "  public void b() {}\n",     # 3
        "}\n",                        # 4
    ]
    new_source = "".join(lines)
    target_name = "nonexistent"

    method_a = SimpleNamespace(name="a", javadoc_start_line=None, start_line=2, end_line=2)
    method_b = SimpleNamespace(name="b", javadoc_start_line=None, start_line=3, end_line=3)

    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: [object()]  # indicate class presence
    analyzer.find_methods = lambda src: [method_a, method_b]
    analyzer.find_fields = lambda src: []  # no fields

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 5.14μs -> 4.60μs (11.7% faster)

    # Both methods should be returned as helpers (order preserved)
    expected_helpers = ["".join(lines[1:2]), "".join(lines[2:3])]

def test_fields_with_none_or_empty_source_are_filtered_out():
    # Scenario: fields returned by the analyzer include None and empty strings.
    # Expectation: Only truthy source_text values are appended to new_fields.
    new_source = "public class C {}\n"
    target_name = "irrelevant"

    # Fields: one None, one empty string, one valid string.
    field_none = SimpleNamespace(source_text=None)
    field_empty = SimpleNamespace(source_text="")
    field_valid = SimpleNamespace(source_text="private static final int X = 42;\n")

    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: [object()]
    analyzer.find_methods = lambda src: []  # no methods
    analyzer.find_fields = lambda src: [field_none, field_empty, field_valid]

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 2.90μs -> 2.75μs (5.79% faster)

def test_javadoc_missing_uses_start_line_for_extraction():
    # Scenario: A method has no javadoc_start_line (None), so extraction should use start_line.
    # Expectation: The method slice should correspond exactly to start_line..end_line.
    lines = [
        "public class C {\n",        # 1
        "  // some comment\n",       # 2
        "  public void foo() {}\n",  # 3
        "}\n",                       # 4
    ]
    new_source = "".join(lines)
    target_name = "foo"

    method = SimpleNamespace(name="foo", javadoc_start_line=None, start_line=3, end_line=3)

    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: [object()]
    analyzer.find_methods = lambda src: [method]
    analyzer.find_fields = lambda src: []

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 4.12μs -> 3.96μs (4.04% faster)

    expected = "".join(lines[2:3])  # only the method line

def test_large_scale_many_helper_methods_and_fields_performance_and_correctness():
    # Scenario: Large but moderate number of helper methods and fields to validate scalability.
    # Constraint: Keep loops under 1000 (we will generate 200 helpers and 200 fields).
    num_helpers = 200
    num_fields = 200

    lines: List[str] = []
    # Build a simple source where each method/field occupies a single line.
    # Start with class opening
    lines.append("public class Big {\n")  # line 1

    current_line = 2
    helper_methods = []
    for i in range(num_helpers):
        method_line = f"  void helper_{i}() {{ /* {i} */ }}\n"
        lines.append(method_line)
        # Each method occupies a single line; start_line and end_line are the same number.
        method_meta = SimpleNamespace(
            name=f"helper_{i}",
            javadoc_start_line=None,
            start_line=current_line,
            end_line=current_line,
        )
        helper_methods.append(method_meta)
        current_line += 1

    # Add the target method last among methods
    target_line = "  void targetMethod() { /* target */ }\n"
    lines.append(target_line)
    target_meta = SimpleNamespace(name="targetMethod", javadoc_start_line=None, start_line=current_line, end_line=current_line)
    current_line += 1

    # Add many fields, each on a single line
    fields_meta = []
    for i in range(num_fields):
        field_line = f"  static int F_{i} = {i};\n"
        lines.append(field_line)
        field_meta = SimpleNamespace(source_text=field_line)
        fields_meta.append(field_meta)
        current_line += 1

    lines.append("}\n")  # class close
    new_source = "".join(lines)
    target_name = "targetMethod"

    # Analyzer returns class present, combined methods (helpers + target)
    analyzer = SimpleNamespace()
    analyzer.find_classes = lambda src: [object()]
    # preserve order: helpers then target
    analyzer.find_methods = lambda src: helper_methods + [target_meta]
    analyzer.find_fields = lambda src: fields_meta

    codeflash_output = _parse_optimization_source(new_source, target_name, analyzer); result = codeflash_output # 3.17ms -> 78.0μs (3969% faster)

    # Expectation:
    # - target_method_source should equal the single line corresponding to target_meta
    #   which is at the appropriate location in lines. Locate its index:
    # Calculate expected target slice using the same logic as the function:
    expected_target_start = (target_meta.javadoc_start_line or target_meta.start_line) - 1
    expected_target_end = target_meta.end_line
    expected_target = "".join(new_source.splitlines(keepends=True)[expected_target_start:expected_target_end])

    # A quick content sanity check for a couple of entries
    # First helper should be present and correctly extracted
    first_helper_expected = new_source.splitlines(keepends=True)[1]  # second line in file (index 1)

    # First field should match
    # Find index of first field in the built lines: it's after 1 + num_helpers + 1 (class + helpers + target)
    first_field_index = 1 + num_helpers + 1
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-03T19.29.36 and push.

Codeflash Static Badge

The optimization achieves a **30% runtime improvement** (14.7ms → 11.2ms) by eliminating redundant string operations in the `_parse_optimization_source` function, particularly when processing Java source code with multiple methods and fields.

**Key Changes:**

1. **Single-pass line splitting**: The original code called `new_source.splitlines(keepends=True)` once for the target method and again for *each* helper method. With many helper methods, this became O(n²) behavior. The optimized version splits the source **once** and reuses the `lines` array for all method extractions, reducing this to O(n).

2. **Combined method extraction loop**: Instead of two separate loops (one to find the target method, another to extract helpers), the optimization uses a single loop that extracts both the target and helpers in one pass. This halves the iteration overhead and the number of string join operations.

3. **Type-checking guard in JavaAnalyzer**: Added `isinstance(source, str)` checks before encoding in `find_methods`, `find_classes`, and `find_fields`. While all current callers pass strings (making this a no-op), it prevents potential double-encoding if the code evolves to accept pre-encoded bytes.

**Why This Improves Performance:**

The dominant cost in the original code was string processing: splitting a potentially large source file into lines multiple times (once per helper method). The line profiler shows that in the original version, `new_source.splitlines(keepends=True)` consumed **~8.8%** of total function time (3.6ms across 282 calls for helpers, plus additional calls for the target). By performing this operation just once, the optimization eliminates this repeated work.

**Test Case Performance:**

The improvement is most dramatic for code with many methods. The `test_large_scale_many_helper_methods_and_fields_performance_and_correctness` test shows a **3969% speedup** (3.17ms → 78.0μs) when processing 200 helper methods and 200 fields. Even modest test cases with a few methods show 5-14% improvements, confirming the optimization benefits both common and edge cases.

**Impact on Workloads:**

This optimization directly benefits Java code analysis workflows that parse optimization suggestions containing multiple methods and fields—a common scenario when AI-generated optimizations include helper methods or require additional class members. The single-pass approach scales linearly with the number of methods, whereas the original approach degraded quadratically.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 3, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 3, 2026
@KRRT7
Copy link
Collaborator

KRRT7 commented Feb 19, 2026

Closing stale bot PR.

@KRRT7 KRRT7 closed this Feb 19, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1199-2026-02-03T19.29.36 branch February 19, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant