Skip to content

⚡️ Speed up method JavaSupport.normalize_code by 43% in PR #1199 (omni-java)#1309

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T12.05.28
Closed

⚡️ Speed up method JavaSupport.normalize_code by 43% in PR #1199 (omni-java)#1309
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T12.05.28

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 43% (0.43x) speedup for JavaSupport.normalize_code in codeflash/languages/java/support.py

⏱️ Runtime : 1.57 milliseconds 1.10 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 42% speedup by adding a fast-path optimization for handling line comments (//) in Java code.

What changed:
The key optimization adds an early-exit check when processing line comments. Before the expensive character-by-character string parsing loop, it now performs a quick validation:

if "//" in line:
    comment_pos = line.find("//")
    prefix = line[:comment_pos]
    if '"' not in prefix:
        line = prefix
    else:
        # Fall back to original detailed parsing

Why this is faster:

  1. String operations vs character iteration: The optimized version uses Python's highly optimized built-in string methods (find() and in) which are implemented in C and operate on the entire string at once, rather than iterating character-by-character through Python bytecode.

  2. Early exit avoids expensive operations: When there are no quotes before //, the code can skip:

    • The enumerate(line) loop that inspects every character
    • Multiple conditional checks per character (escape handling, quote tracking, string state management)
    • String slicing operations to check for // at each position
  3. Common case optimization: Most Java code lines with // comments don't have string literals before the comment (e.g., int x = 5; // comment). The test results confirm this - cases like test_single_line_comment_at_end show 118% speedup, and test_multiple_line_comments shows 124% speedup.

Performance breakdown from test results:

  • Simple comment cases (no quotes): 46-175% faster (e.g., test_high_comment_density: 60.9% faster)
  • Cases with strings before comments: 3-10% slower (due to the extra check, but still acceptable trade-off)
  • Overall large-scale scenarios: 10-145% faster depending on comment density

The optimization particularly excels in high-comment-density scenarios (common in well-documented code), where the fast path is taken frequently, leading to cumulative performance gains across hundreds of lines.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 76 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest
from codeflash.languages.java.support import JavaSupport

class TestJavaSupportNormalizeCodeBasic:
    """Test basic functionality of JavaSupport.normalize_code."""

    def test_empty_string(self):
        """Test normalization of empty string."""
        support = JavaSupport()
        codeflash_output = support.normalize_code(""); result = codeflash_output # 1.05μs -> 1.07μs (1.87% slower)

    def test_whitespace_only(self):
        """Test normalization of whitespace-only input."""
        support = JavaSupport()
        codeflash_output = support.normalize_code("   \n  \n   "); result = codeflash_output # 2.29μs -> 2.29μs (0.000% faster)

    def test_single_newline(self):
        """Test normalization of single newline."""
        support = JavaSupport()
        codeflash_output = support.normalize_code("\n"); result = codeflash_output # 1.55μs -> 1.46μs (6.22% faster)

    def test_simple_code_no_comments(self):
        """Test normalization of simple Java code with no comments."""
        support = JavaSupport()
        source = "public class Hello {\n    public void greet() {\n        System.out.println(\"Hi\");\n    }\n}"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.60μs -> 3.46μs (4.05% faster)
        expected = "public class Hello {\npublic void greet() {\nSystem.out.println(\"Hi\");\n}\n}"

    def test_single_line_comment_at_end(self):
        """Test normalization with line comment at end of code line."""
        support = JavaSupport()
        source = "int x = 5; // this is a comment"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 5.52μs -> 2.53μs (118% faster)

    def test_line_comment_entire_line(self):
        """Test normalization when entire line is a comment."""
        support = JavaSupport()
        source = "int x = 5;\n// this is a comment\nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.93μs -> 3.23μs (21.7% faster)
        expected = "int x = 5;\nint y = 10;"

    def test_multiple_line_comments(self):
        """Test normalization with multiple line comments."""
        support = JavaSupport()
        source = "// Comment 1\nint x = 5; // Comment 2\nint y = 10; // Comment 3"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 8.98μs -> 4.01μs (124% faster)
        expected = "int x = 5;\nint y = 10;"

    def test_single_line_block_comment(self):
        """Test normalization with block comment on single line."""
        support = JavaSupport()
        source = "int x = 5; /* block comment */ int y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.27μs -> 3.22μs (1.83% faster)

    def test_multiline_block_comment(self):
        """Test normalization with block comment spanning multiple lines."""
        support = JavaSupport()
        source = "int x = 5;\n/* this is a\nmulti-line comment */\nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 4.12μs -> 4.11μs (0.243% faster)
        expected = "int x = 5;\nint y = 10;"

    def test_mixed_comments(self):
        """Test normalization with both line and block comments."""
        support = JavaSupport()
        source = "// Line comment\nint x = 5; /* block */ int y = 10; // end"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 11.4μs -> 4.52μs (153% faster)

    def test_string_with_double_slash(self):
        """Test that double slash inside string is not treated as comment."""
        support = JavaSupport()
        source = 'String url = "http://example.com"; // actual comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 7.34μs -> 8.15μs (9.86% slower)

    def test_string_with_block_comment_markers(self):
        """Test that /* and */ inside string are not treated as comment markers."""
        support = JavaSupport()
        source = 'String pattern = "/* not */ a comment"; // real comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 9.11μs -> 9.80μs (7.05% slower)

    def test_escaped_quote_in_string(self):
        """Test that escaped quotes don't break string detection."""
        support = JavaSupport()
        source = 'String s = "He said \\"Hello\\""; // comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 6.95μs -> 7.60μs (8.56% slower)

    def test_leading_and_trailing_whitespace_removed(self):
        """Test that leading and trailing whitespace is removed from each line."""
        support = JavaSupport()
        source = "   int x = 5;   \n    int y = 10;    "
        codeflash_output = support.normalize_code(source); result = codeflash_output # 2.52μs -> 2.45μs (2.89% faster)
        expected = "int x = 5;\nint y = 10;"

class TestJavaSupportNormalizeCodeEdgeCases:
    """Test edge cases for JavaSupport.normalize_code."""

    def test_unclosed_block_comment(self):
        """Test handling of unclosed block comment (removes rest of code)."""
        support = JavaSupport()
        source = "int x = 5;\n/* unclosed comment\nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.23μs -> 3.15μs (2.54% faster)

    def test_unclosed_block_comment_to_end(self):
        """Test unclosed block comment to end of file."""
        support = JavaSupport()
        source = "int x = 5; /* starts comment"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 2.71μs -> 2.73μs (0.734% slower)

    def test_multiple_block_comments_on_one_line(self):
        """Test multiple block comments on the same line."""
        support = JavaSupport()
        source = "int x = /* c1 */ 5 /* c2 */ ; int y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.10μs -> 3.17μs (1.93% slower)

    def test_block_comment_immediately_after_code(self):
        """Test block comment immediately following code on same line."""
        support = JavaSupport()
        source = "int x = 5;/* comment */int y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.06μs -> 3.01μs (1.63% faster)

    def test_empty_string_literal(self):
        """Test handling of empty string literal."""
        support = JavaSupport()
        source = 'String s = ""; // comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 5.89μs -> 6.49μs (9.26% slower)

    def test_string_with_escaped_backslash(self):
        """Test string with escaped backslash followed by quote."""
        support = JavaSupport()
        source = 'String path = "C:\\\\folder"; // comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 7.02μs -> 7.71μs (8.96% slower)

    def test_multiple_strings_on_one_line(self):
        """Test multiple string literals on one line with comments."""
        support = JavaSupport()
        source = 'String s1 = "http://a"; String s2 = "http://b"; // comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 9.33μs -> 9.91μs (5.86% slower)

    def test_consecutive_empty_lines_removed(self):
        """Test that consecutive empty lines are all removed."""
        support = JavaSupport()
        source = "int x = 5;\n\n\nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 2.67μs -> 2.71μs (1.15% slower)
        expected = "int x = 5;\nint y = 10;"

    def test_block_comment_start_and_end_different_lines(self):
        """Test block comment that spans multiple lines."""
        support = JavaSupport()
        source = "int x = 5;\n/*\ncomment line 1\ncomment line 2\n*/\nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 4.27μs -> 4.38μs (2.51% slower)
        expected = "int x = 5;\nint y = 10;"

    def test_line_with_only_line_comment(self):
        """Test line containing only a line comment is removed."""
        support = JavaSupport()
        source = "int x = 5;\n   // just a comment   \nint y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 4.87μs -> 3.33μs (46.4% faster)
        expected = "int x = 5;\nint y = 10;"

    def test_block_comment_with_no_closing_on_same_line(self):
        """Test block comment start with no closing on same line."""
        support = JavaSupport()
        source = "int x = 5; /* starts here\nint y = 10; */ more code"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 3.90μs -> 3.74μs (4.31% faster)

    def test_double_slash_after_block_comment_end(self):
        """Test line comment after block comment ends."""
        support = JavaSupport()
        source = "int x = /* block */ 5; // line comment"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 8.49μs -> 3.80μs (123% faster)

    def test_block_comment_end_then_line_comment(self):
        """Test block comment closure followed by line comment."""
        support = JavaSupport()
        source = "int x = 5;\n/* block */\nint y = 10; // line"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 7.55μs -> 4.36μs (73.3% faster)
        expected = "int x = 5;\nint y = 10;"

    def test_string_containing_both_slashes_and_comment(self):
        """Test string with // followed by actual line comment."""
        support = JavaSupport()
        source = 'String url = "http://site"; int x = 5; // real comment'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 8.63μs -> 9.38μs (8.02% slower)

    def test_whitespace_between_operators(self):
        """Test that whitespace handling preserves code structure."""
        support = JavaSupport()
        source = "int x   =   5   ;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 1.67μs -> 1.61μs (3.72% faster)

    def test_tabs_and_spaces_mixed(self):
        """Test handling of mixed tabs and spaces."""
        support = JavaSupport()
        source = "\t\tint x = 5;\n    int y = 10;"
        codeflash_output = support.normalize_code(source); result = codeflash_output # 2.47μs -> 2.38μs (4.17% faster)
        expected = "int x = 5;\nint y = 10;"

class TestJavaSupportNormalizeCodeLargeScale:
    """Test large scale scenarios for JavaSupport.normalize_code."""

    def test_large_code_with_many_comments(self):
        """Test normalization of large code with many comments."""
        support = JavaSupport()
        # Build 500+ lines of Java code with interspersed comments
        lines = []
        for i in range(250):
            lines.append(f"// Comment {i}")
            lines.append(f"int var{i} = {i};")
            if i % 3 == 0:
                lines.append(f"/* Block comment {i} */")
            if i % 5 == 0:
                lines.append(f"String str{i} = \"value{i}\"; // line comment")
        
        source = "\n".join(lines)
        codeflash_output = support.normalize_code(source); result = codeflash_output # 399μs -> 362μs (10.0% faster)
        
        # Count lines - should have no comment-only lines
        result_lines = result.split("\n")

    def test_large_code_with_long_strings(self):
        """Test normalization with very long string literals."""
        support = JavaSupport()
        long_string = "x" * 1000
        source = f'String s = "{long_string}"; // comment\nint x = 5;'
        codeflash_output = support.normalize_code(source); result = codeflash_output # 67.2μs -> 69.6μs (3.48% slower)

    def test_low_comment_density(self):
        """Test performance with low comment density."""
        support = JavaSupport()
        lines = []
        for i in range(500):
            lines.append(f"int var{i} = {i};")
        lines.append("// Single comment at end")
        
        source = "\n".join(lines)
        codeflash_output = support.normalize_code(source); result = codeflash_output # 72.0μs -> 71.2μs (1.18% faster)
        
        result_lines = result.split("\n")

    def test_high_comment_density(self):
        """Test performance with high comment density."""
        support = JavaSupport()
        lines = []
        for i in range(250):
            lines.append(f"// Comment line 1 for {i}")
            lines.append(f"// Comment line 2 for {i}")
            lines.append(f"int var{i} = {i};")
            lines.append(f"// Comment line 3 for {i}")
        
        source = "\n".join(lines)
        codeflash_output = support.normalize_code(source); result = codeflash_output # 373μs -> 232μs (60.9% faster)
        
        result_lines = result.split("\n")

    def test_mixed_block_and_line_comments_large(self):
        """Test handling of mixed comment types at scale."""
        support = JavaSupport()
        lines = []
        for i in range(100):
            lines.append(f"int x{i} = {i}; // line {i}")
            lines.append(f"/* block {i} */")
            lines.append(f"int y{i} = {i * 2}; /* inline {i} */")
        
        source = "\n".join(lines)
        codeflash_output = support.normalize_code(source); result = codeflash_output # 372μs -> 135μs (175% faster)

    def test_large_code_preserves_semantics(self):
        """Test that normalization preserves code semantics at scale."""
        support = JavaSupport()
        source = """
        // Package declaration
        package com.example;
        
        /* Imports */
        import java.util.List;
        import java.util.ArrayList;
        
        // Class definition
        public class Calculator {
            // Constructor comment
            public Calculator() {}
            
            /* Main method */
            public int add(int a, int b) { // line comment
                /* Calculate sum */
                return a + b; // return statement
            }
            
            // Another method
            public int multiply(int a, int b) {
                return a * b;
            }
        }
        """
        
        codeflash_output = support.normalize_code(source); result = codeflash_output # 33.4μs -> 13.7μs (145% faster)

    def test_many_consecutive_block_comments(self):
        """Test handling of many consecutive block comments."""
        support = JavaSupport()
        lines = []
        for i in range(100):
            lines.append(f"/* Comment block {i} */")
            lines.append(f"int v{i} = {i};")
        
        source = "\n".join(lines)
        codeflash_output = support.normalize_code(source); result = codeflash_output # 60.6μs -> 60.6μs (0.097% faster)
        
        result_lines = result.split("\n")

    def test_deeply_nested_logical_structure(self):
        """Test normalization of code with deeply nested structures."""
        support = JavaSupport()
        source = """
        // Outer class
        public class Outer {
            // Inner class
            public class Inner {
                // Method
                public void method() {
                    // If statement
                    if (true) {
                        // For loop
                        for (int i = 0; i < 10; i++) {
                            // Do something
                            System.out.println("i"); // print statement
                        }
                    }
                }
            }
        }
        """
        
        codeflash_output = support.normalize_code(source); result = codeflash_output # 36.9μs -> 20.4μs (81.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-03T12.05.28 and push.

Codeflash Static Badge

The optimized code achieves a **42% speedup** by adding a fast-path optimization for handling line comments (`//`) in Java code.

**What changed:**
The key optimization adds an early-exit check when processing line comments. Before the expensive character-by-character string parsing loop, it now performs a quick validation:

```python
if "//" in line:
    comment_pos = line.find("//")
    prefix = line[:comment_pos]
    if '"' not in prefix:
        line = prefix
    else:
        # Fall back to original detailed parsing
```

**Why this is faster:**
1. **String operations vs character iteration**: The optimized version uses Python's highly optimized built-in string methods (`find()` and `in`) which are implemented in C and operate on the entire string at once, rather than iterating character-by-character through Python bytecode.

2. **Early exit avoids expensive operations**: When there are no quotes before `//`, the code can skip:
   - The `enumerate(line)` loop that inspects every character
   - Multiple conditional checks per character (escape handling, quote tracking, string state management)
   - String slicing operations to check for `//` at each position

3. **Common case optimization**: Most Java code lines with `//` comments don't have string literals before the comment (e.g., `int x = 5; // comment`). The test results confirm this - cases like `test_single_line_comment_at_end` show **118% speedup**, and `test_multiple_line_comments` shows **124% speedup**.

**Performance breakdown from test results:**
- Simple comment cases (no quotes): **46-175% faster** (e.g., `test_high_comment_density`: 60.9% faster)
- Cases with strings before comments: **3-10% slower** (due to the extra check, but still acceptable trade-off)
- Overall large-scale scenarios: **10-145% faster** depending on comment density

The optimization particularly excels in high-comment-density scenarios (common in well-documented code), where the fast path is taken frequently, leading to cumulative performance gains across hundreds of lines.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 3, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 3, 2026
misrasaurabh1 added a commit that referenced this pull request Feb 3, 2026
The bug was introduced in commit 06353ea which added a fallback that
applied a single code block to ANY file being processed. This caused
issues like PR #1309 where normalize_java_code was duplicated in
support.py because optimized code for formatter.py was incorrectly
applied to it.

The fix restricts the single-code-block fallback to non-Python languages
only, where flexible path matching is needed (Java/JS/TS). For Python,
exact path matching is now required.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@KRRT7
Copy link
Collaborator

KRRT7 commented Feb 19, 2026

Closing stale bot PR.

@KRRT7 KRRT7 closed this Feb 19, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1199-2026-02-03T12.05.28 branch February 19, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant