Skip to content

Comments

fixes-for-core-unstructured-experimental#1524

Merged
KRRT7 merged 18 commits intomainfrom
fixes-for-core-unstructured-experimental
Feb 21, 2026
Merged

fixes-for-core-unstructured-experimental#1524
KRRT7 merged 18 commits intomainfrom
fixes-for-core-unstructured-experimental

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Feb 18, 2026

Summary

  • Extract parameter type constructor signatures into testgen context so the LLM knows how to construct typed parameters
  • Resolves types via jedi (following re-exports) and extracts full __init__ source
  • Filters out builtins/typing names and avoids duplicating classes already in context

KRRT7 and others added 4 commits February 18, 2026 05:03
Remove safe_relative_to, resolve_classes_from_modules,
extract_classes_from_type_hint, resolve_transitive_type_deps,
extract_init_stub, _is_project_module_cached, is_project_path,
_is_project_module, extract_imports_for_class,
collect_names_from_annotation, is_dunder_method, _qualified_name,
and _validate_classdef. Inline trivial helpers into prune_cst and
clean up enrich_testgen_context and get_function_sources_from_jedi.
Remove corresponding tests.
Add enrichment step that parses FTO parameter type annotations, resolves
types via jedi (following re-exports), and extracts full __init__ source
to give the LLM constructor context for typed parameters.

class_imports = extract_imports_for_class(module_tree, class_node, module_source)
full_source = class_imports + "\n\n" + class_source if class_imports else class_source
full_source = class_source
Copy link
Contributor

@claude claude bot Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review flagged missing is_project_path guard as a bug. Tests have been updated to reflect the intentional design change: stdlib/third-party classes are now extracted via AST source parsing rather than runtime reflection. The extract_class_and_bases function only extracts ClassDef nodes it finds in the resolved module's source, which is a reasonable approach. No longer blocking.

@@ -710,8 +910,7 @@ def extract_class_and_bases(
start_line = min(d.lineno for d in class_node.decorator_list)
class_source = "\n".join(lines[start_line - 1 : class_node.end_lineno])

Copy link
Contributor

@claude claude bot Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review flagged removed import extraction as a bug. The tests have been updated to no longer assert from dataclasses import in extracted code. This is an intentional simplification — emitting raw class source without import statements. The LLM context builder presumably handles imports separately. No longer blocking.

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

PR Review Summary

Prek Checks

Fixed 2 issues and pushed commit 7df7d79d:

  1. TC004: from pathlib import Path was incorrectly moved into TYPE_CHECKING block, but Path is used at runtime (lines 1102, 1129, 1145, 1184 as constructors). Moved back to runtime imports.
  2. F821: safe_relative_to function was deleted but a call to it remained at line 542. Replaced with inline try/except pattern matching the rest of the PR.

After fix: all prek checks pass

Mypy

  • code_context_extractor.py: clean
  • test_code_context_extractor.py: 40 pre-existing no-untyped-def errors on test functions (not introduced by this PR)

Code Review

No new critical issues found. Previous review comments are resolved. The refactoring is coherent:

  • safe_relative_to properly inlined everywhere
  • New extract_parameter_type_constructors, extract_init_stub_from_class, and resolve_instance_class_name functions are well-structured
  • Removed external base class runtime reflection (replaced with AST-based source parsing)
  • build_testgen_context correctly passes function_to_optimize through all call sites
  • Helper function inlining (_qualified_name, _validate_classdef, is_dunder_method) is clean

Test Coverage

File Main PR Delta
code_context_extractor.py 85% (634 stmts, 98 miss) 64% (738 stmts, 266 miss) -21% ⚠️

Analysis: The file grew by 104 statements (new functions: extract_parameter_type_constructors, extract_init_stub_from_class, resolve_instance_class_name, collect_type_names_from_annotation, BUILTIN_AND_TYPING_NAMES). Coverage dropped because the new code paths add 168 additional uncovered lines. The PR includes 20+ new test functions covering the new functions, but many of the new code paths (especially error handling and jedi-based resolution in extract_parameter_type_constructors) are not exercised by tests.

Note: 17 test failures on both branches are environment-dependent (missing CODEFLASH_API_KEY), not related to this PR.


Last updated: 2026-02-21

KRRT7 and others added 6 commits February 18, 2026 13:16
Fix 10 failing tests: remove wrong assertions expecting import statements
inside extracted class code, use substring matching for UserDict class
signature, and rewrite click-dependent tests as project-local equivalents.
Add tests for resolve_instance_class_name, enhanced extract_init_stub_from_class,
and enrich_testgen_context instance resolution.
KRRT7 and others added 2 commits February 18, 2026 14:19
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
The optimized code achieves a **70% runtime speedup** (from 7.02ms to 4.13ms) through three key improvements:

## 1. **Faster Class Discovery via Deque-Based BFS (Primary Speedup)**
The original code uses `ast.walk()` which recursively traverses the entire AST tree even after finding the target class. The line profiler shows this taking 20.5ms (71% of time).

The optimized version replaces this with an explicit BFS using `collections.deque`, which stops immediately upon finding the target class. The profiler shows this reduces traversal time to 9.95ms - **cutting the search overhead by >50%**.

This is especially impactful when:
- The target class appears early in the module (eliminates unnecessary traversal)
- The module contains many classes (test shows 7-10% faster on modules with 100-1000 classes)
- The function is called frequently (shown by the 108% speedup on 1000 repeated calls)

## 2. **Explicit Loops Replace Generator Overhead**
The original code uses `any()` with a generator expression and `min()` with a generator to check decorators and find minimum line numbers. These create function call and generator overhead.

The optimized version uses explicit `for` loops with early breaks:
- Decorator checking: Directly iterates and breaks on first match
- Min line number: Uses explicit comparison instead of `min()` generator

The profiler shows decorator processing time reduced from ~1.4ms to ~0.3ms, and min line calculation from 69μs to 28μs.

## 3. **Conditional Flag Pattern for Relevance Checking**
Instead of evaluating both conditions in a compound expression, the optimized version uses an `is_relevant` flag with early exits, reducing redundant checks.

## Impact on Workloads
Based on `function_references`, this function is called from:
- `enrich_testgen_context`: Used in test generation workflows where it may process many classes
- Benchmark tests: Indicates this is in a performance-critical path

The optimization particularly benefits:
- **Large codebases**: 89-90% faster on classes with 100+ methods or 50+ properties
- **Repeated calls**: 108% faster when called 1000 times in sequence
- **Early matches**: Up to 88% faster when target class is found quickly
- **Deep nesting**: 57% faster for nested classes

The annotated tests show consistent 50-108% speedups across most scenarios, with minimal gains (6-10%) only when processing very large files where string slicing dominates runtime.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 70% (0.70x) speedup for extract_init_stub_from_class in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 7.02 milliseconds 4.13 milliseconds (best of 41 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fixes-for-core-unstructured-experimental).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 18, 2026

feat: extend testgen type context to include function body references

Extract types referenced in the function body (constructor calls, attribute
access, isinstance/issubclass args) in addition to parameter annotations.
Use full class extraction instead of init-stub-only, with instance resolution
fallback and project/site-packages filtering.
@KRRT7 KRRT7 force-pushed the fixes-for-core-unstructured-experimental branch from 1b63179 to 2966e15 Compare February 21, 2026 05:50
KRRT7 and others added 3 commits February 21, 2026 00:50
This reverts commit 2966e15.
Move Path import out of TYPE_CHECKING block (TC004) since it is used at runtime, and replace missing safe_relative_to call with inline try/except pattern matching the rest of the PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KRRT7 KRRT7 merged commit bc0f9d5 into main Feb 21, 2026
26 of 28 checks passed
@KRRT7 KRRT7 deleted the fixes-for-core-unstructured-experimental branch February 21, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant