⚡️ Speed up function `_extract_class_declaration` by 29% in PR #1199 (`omni-java`) by codeflash-ai[bot] · Pull Request #1371 · codeflash-ai/codeflash

codeflash-ai · 2026-02-04T05:06:03Z

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.

📄 29% (0.29x) speedup for `_extract_class_declaration` in `codeflash/languages/java/context.py`

⏱️ Runtime : 236 microseconds → 183 microseconds (best of 18 runs)

📝 Explanation and details

The optimized code achieves a 28% runtime improvement (from 236μs to 183μs) through two key optimizations:

1. Module-Level Body Type Mapping (7-14% of gains)

The body_types dictionary is moved from a local variable recreated on every function call to a module-level constant _BODY_TYPES. This eliminates the overhead of dictionary construction for each invocation, saving ~7-14 microseconds per call based on line profiler data (7655ns reduced to effectively zero allocation cost).

2. Deferred UTF-8 Decoding (Primary optimization: 70-85% of gains)

Instead of decoding each child node's bytes individually in the loop:

Original: Decoded each slice separately (decode("utf8") called ~1048 times per execution)
Optimized: Collects byte slices first, joins them as bytes with b" ".join(), then performs a single decode at the end

The line profiler shows the impact:

Original: Line with decode("utf8") takes 524μs (35.7% of total time)
Optimized: Byte slice collection takes 541μs but final decode+join is only 34μs (3% of total time)

This reduces the UTF-8 decoding overhead from ~35% to ~3% of execution time because:

UTF-8 decoding has fixed per-call overhead in Python's codec machinery
Processing one large byte sequence is more cache-efficient than 1000+ small ones
The CPython decoder can use optimized SIMD paths for larger contiguous buffers

Test Results Validation

The optimization excels particularly in scenarios with many children nodes:

Large scale test (500 children): 33% faster (200μs → 150μs)
UTF-8 heavy test (unicode characters): 31.8% faster
Complex declarations: 8-13% faster for typical cases
Edge cases (empty/single child): Minimal impact (<2% variation)

This optimization is especially valuable for Java code analysis workloads that parse complex class hierarchies with many modifiers, generics, annotations, or deeply nested type declarations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 9 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from __future__ import annotations

from types import \
    SimpleNamespace  # used to create lightweight real objects with attributes
from typing import List

# imports
import pytest  # used for our unit tests
from codeflash.languages.java.context import _extract_class_declaration
from tree_sitter import Node

# Helper to create child-like objects. We use types.SimpleNamespace (a real stdlib class),
# not user-defined fake domain classes, to satisfy attribute access used by the function.
def make_children_from_segments(segments: List[tuple[str, str]], source: bytes):
    """
    Build a list of SimpleNamespace objects each with .type, .start_byte, .end_byte
    segments: list of tuples (text, type) where 'text' is the substring and 'type' is the node type
    source: the full bytes buffer from which start/end will be computed by locating each segment
    This helper assumes that the concatenation of the 'text' values (in order) is exactly the decoded
    version of 'source' or is present sequentially. Uses a pointer to find successive slices.
    """
    children = []
    cursor = 0
    decoded = source.decode("utf8", errors="surrogateescape")
    # We'll find each segment in the decoded string starting at cursor to compute byte indices
    for text, ntype in segments:
        # find the next occurrence of text starting at cursor
        idx = decoded.find(text, cursor)
        if idx == -1:
            # fallback: raise to make test author aware of mismatch
            raise ValueError(f"Segment {text!r} not found in source starting at {cursor}")
        # compute byte indices
        start_byte = len(decoded[:idx].encode("utf8"))
        end_byte = start_byte + len(text.encode("utf8"))
        children.append(SimpleNamespace(type=ntype, start_byte=start_byte, end_byte=end_byte))
        cursor = idx + len(text)
    return children

def test_basic_simple_declaration():
    # Basic scenario: a simple class declaration followed by a class_body node should
    # return the declaration concatenated from child parts up to but not including the body.
    source_str = "public class MyClass { }"
    source = source_str.encode("utf8")
    # Define segments in order: each tuple is (substring, node_type)
    segments = [
        ("public", "modifier"),
        (" ", "space1"),
        ("class", "keyword"),
        (" ", "space2"),
        ("MyClass", "identifier"),
        (" ", "space3"),
        ("{", "class_body"),  # body node appears here and should stop extraction before this child
        (" ", "after_body"),
        ("}", "after_body2"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    # Call the function under test
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 4.98μs -> 4.49μs (11.0% faster)

def test_includes_extends_and_implements():
    # More complex declaration with modifiers, extends, and implements clauses.
    source_str = "public final class MyClass extends Base implements Interface { /* body */ }"
    source = source_str.encode("utf8")
    segments = [
        ("public", "modifier"),
        (" ", "space1"),
        ("final", "modifier2"),
        (" ", "space2"),
        ("class", "keyword"),
        (" ", "space3"),
        ("MyClass", "identifier"),
        (" ", "space4"),
        ("extends", "keyword2"),
        (" ", "space5"),
        ("Base", "type"),
        (" ", "space6"),
        ("implements", "keyword3"),
        (" ", "space7"),
        ("Interface", "type2"),
        (" ", "space8"),
        ("{", "class_body"),
        (" /* body */ ", "comment"),
        ("}", "close"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 6.05μs -> 5.58μs (8.42% faster)
    # Expect all tokens before '{' joined by single spaces and stripped
    expected_parts = [seg[0] for seg in segments if seg[1] != "class_body" and seg[0] != " /* body */ " and seg[1] != "close"]
    # The implementation will join all parts up to class_body with a space; assemble expected similarly
    expected = " ".join(expected_parts).strip()

def test_no_body_node_returns_all_parts():
    # Edge case: if there is no 'class_body' child, the function should return the full concatenation
    # of all child parts (no early break).
    source_str = "public class NoBody MyClass"
    source = source_str.encode("utf8")
    segments = [
        ("public", "modifier"),
        (" ", "ws"),
        ("class", "keyword"),
        (" ", "ws2"),
        ("NoBody", "identifier"),
        (" ", "ws3"),
        ("MyClass", "identifier2"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 3.97μs -> 3.52μs (12.8% faster)

def test_body_as_first_child_returns_empty_string():
    # Edge case: if the very first child is the class_body, the result must be empty string.
    source_str = "{ public class Something }"
    source = source_str.encode("utf8")
    segments = [
        ("{", "class_body"),
        (" public", "prefix"),
        (" class", "keyword"),
        (" Something", "identifier"),
        (" }", "suffix"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 1.46μs -> 1.48μs (1.35% slower)

def test_interface_body_is_ignored_for_class_extraction():
    # Ensure that for class extraction, encountering an 'interface_body' does not stop extraction.
    # Only 'class_body' should cause a break when type_kind == 'class'.
    source_str = "public class X interface_token { } class_rest { }"
    source = source_str.encode("utf8")
    # Construct parts where an 'interface_body' appears before the 'class_body'
    segments = [
        ("public", "modifier"),
        (" ", "ws1"),
        ("class", "keyword"),
        (" ", "ws2"),
        ("X", "identifier"),
        (" ", "ws3"),
        ("interface_token", "interface_body"),  # should be ignored by _extract_type_declaration for class
        (" ", "ws4"),
        ("{", "class_body"),  # actual class body; stop here
        (" ", "ws5"),
        ("}", "close"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); res = codeflash_output # 4.36μs -> 3.83μs (13.8% faster)

def test_utf8_characters_in_parts_are_handled():
    # Non-ASCII unicode characters should be decoded properly from utf8 and included.
    source_str = "public class Café☕️ extends База { }"
    source = source_str.encode("utf8")
    segments = [
        ("public", "modifier"),
        (" ", "ws"),
        ("class", "keyword"),
        (" ", "ws2"),
        ("Café☕️", "identifier"),
        (" ", "ws3"),
        ("extends", "keyword2"),
        (" ", "ws4"),
        ("База", "type"),
        (" ", "ws5"),
        ("{", "class_body"),
    ]
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 6.02μs -> 4.57μs (31.8% faster)

def test_invalid_utf8_raises_unicode_decode_error():
    # If source bytes contain invalid utf8 sequences in the sliced range, decoding should raise UnicodeDecodeError.
    # We'll craft a bytes string where a child slice includes an invalid byte sequence.
    # Note: 0xff is invalid as a standalone in utf-8 decoders.
    source = b"valid_prefix" + bytes([0xff, 0xff]) + b"valid_suffix"
    # Create a child that targets the invalid bytes region
    start = len(b"valid_prefix")
    end = start + 2
    bad_child = SimpleNamespace(type="keyword", start_byte=start, end_byte=end)
    node = SimpleNamespace(children=[bad_child])
    with pytest.raises(UnicodeDecodeError):
        _extract_class_declaration(node, source) # 7.37μs -> 7.68μs (4.05% slower)

def test_empty_children_returns_empty_string():
    # Edge case: node has no children -> should return empty string
    source = b"anything"
    node = SimpleNamespace(children=[])
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 1.41μs -> 1.42μs (0.773% slower)

def test_large_scale_many_children_performance_and_correctness():
    # Large scale test: create a declaration composed of many small child parts (under 1000 parts).
    # Ensure function handles this efficiently and correctly joins up to the class_body.
    num_parts = 500  # under the 1000-element guideline
    tokens = [f"T{i}" for i in range(num_parts)]
    # Join tokens with single spaces, then append a space and a '{' that will be the class_body marker
    declaration = " ".join(tokens) + " { body }"
    source = declaration.encode("utf8")
    # Build segments: each token is a separate child, then a space, then '{' as class_body
    segments: List[tuple[str, str]] = []
    for t in tokens:
        segments.append((t, "token"))
        segments.append((" ", "space"))
    # After tokens and spaces, append '{' which should be the class_body and stop
    # Note: In source the last space before '{' exists, so we add that as one more segment previously.
    segments.append(("{", "class_body"))
    children = make_children_from_segments(segments, source)
    node = SimpleNamespace(children=children)
    codeflash_output = _extract_class_declaration(node, source); result = codeflash_output # 200μs -> 150μs (33.0% faster)
    # Result should be the original tokens joined by single spaces (strip will remove trailing whitespace)
    expected = " ".join(tokens)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-04T05.05.57 and push.

The optimized code achieves a **28% runtime improvement** (from 236μs to 183μs) through two key optimizations: ## 1. Module-Level Body Type Mapping (7-14% of gains) The `body_types` dictionary is moved from a local variable recreated on every function call to a module-level constant `_BODY_TYPES`. This eliminates the overhead of dictionary construction for each invocation, saving ~7-14 microseconds per call based on line profiler data (7655ns reduced to effectively zero allocation cost). ## 2. Deferred UTF-8 Decoding (Primary optimization: 70-85% of gains) Instead of decoding each child node's bytes individually in the loop: - **Original**: Decoded each slice separately (`decode("utf8")` called ~1048 times per execution) - **Optimized**: Collects byte slices first, joins them as bytes with `b" ".join()`, then performs a **single decode** at the end The line profiler shows the impact: - Original: Line with `decode("utf8")` takes 524μs (35.7% of total time) - Optimized: Byte slice collection takes 541μs but final decode+join is only 34μs (3% of total time) This reduces the UTF-8 decoding overhead from ~35% to ~3% of execution time because: - UTF-8 decoding has fixed per-call overhead in Python's codec machinery - Processing one large byte sequence is more cache-efficient than 1000+ small ones - The CPython decoder can use optimized SIMD paths for larger contiguous buffers ## Test Results Validation The optimization excels particularly in scenarios with many children nodes: - **Large scale test** (500 children): 33% faster (200μs → 150μs) - **UTF-8 heavy test** (unicode characters): 31.8% faster - **Complex declarations**: 8-13% faster for typical cases - **Edge cases** (empty/single child): Minimal impact (<2% variation) This optimization is especially valuable for Java code analysis workloads that parse complex class hierarchies with many modifiers, generics, annotations, or deeply nested type declarations.

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026

codeflash-ai bot mentioned this pull request Feb 4, 2026

codeflash-omni-java #1199

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `_extract_class_declaration` by 29% in PR #1199 (`omni-java`)#1371

⚡️ Speed up function `_extract_class_declaration` by 29% in PR #1199 (`omni-java`)#1371
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-04T05.05.57

codeflash-ai bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1199

📄 29% (0.29x) speedup for _extract_class_declaration in codeflash/languages/java/context.py

📝 Explanation and details

1. Module-Level Body Type Mapping (7-14% of gains)

2. Deferred UTF-8 Decoding (Primary optimization: 70-85% of gains)

Test Results Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 29% (0.29x) speedup for `_extract_class_declaration` in `codeflash/languages/java/context.py`