[HYBIM-490] Enhance error reporting for evaluation framework #129

keith-decker · 2026-01-13T17:06:27Z

Enhance error reporting for evaluation framework

Description

This PR improves error handling and observability in the evaluation framework by introducing structured error tracking and enhanced logging throughout the evaluation pipeline.

Changes

New Components:

Added ErrorEvent dataclass for structured error representation with comprehensive context (timestamp, type, severity, component, message, details, and recovery actions)
Added ErrorTracker class for tracking and aggregating errors with rate limiting capabilities and summary statistics

Enhanced Manager Error Handling:

Improved logging in _enqueue_invocation() to capture queue errors with context
Enhanced _worker_loop() to track and log processing failures with structured data
Added error tracking to evaluator invocations with details about which evaluator failed and why
Improved handler callback error handling to catch failures in _publish_results() without losing evaluation data
Enhanced skip policy evaluation with better error reporting
Improved evaluator configuration parsing with detailed error context and available evaluator listing
Better error messages for unknown evaluators and unsupported invocation types
Added error tracking to evaluator instantiation failures

New Features:

Added get_error_summary() method to Manager for diagnostic access to tracked errors
Export of ErrorEvent and ErrorTracker from public API
Comprehensive test coverage for error reporting functionality

Benefits

Better Diagnostics: Structured error data enables easier troubleshooting and monitoring
Operational Resilience: Errors are caught and logged without stopping evaluation pipeline
Enhanced Observability: Rich context in logs helps track evaluation failures in production
Test Coverage: New test suite validates error handling behavior

util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/manager.py

adityamehra · 2026-01-22T19:46:59Z

util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/manager.py

    def has_evaluators(self) -> bool:
        return any(self._evaluators.values())

+    def get_error_summary(self) -> dict[str, Any]:


Can you please capture more details on the diagnostic purpose this API is being exposed for?

adityamehra · 2026-01-22T19:48:23Z

util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/manager.py

+                    "Evaluator processing failed",
+                    extra={
+                        "error_type": "processing_error",
+                        "component": "worker",


It would be good to have the thread name in the component.

util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/manager.py

adityamehra · 2026-01-22T19:52:14Z

@keith-decker Once the recent changes for queue and concurrency from main are merged the error handling needs to be updated accordingly.

- Added support for tracking errors by worker name and distinguishing between async and sync errors in ErrorTracker. - Improved ErrorEvent to include worker name and async context. - Updated Manager to log detailed error information during queue full and processing failures. - Added unit tests for concurrent error scenarios and validation of error tracking functionality.

keith-decker · 2026-01-26T16:42:56Z

@adityamehra This PR has been updated to support concurrency and the queue.

[feature] enhance error reporting for evals

7717cae

keith-decker requested review from a team as code owners January 13, 2026 17:06

keith-decker added 2 commits January 13, 2026 10:13

Linting fixes

c3cfd25

lint fixes

f622c59

adityamehra reviewed Jan 22, 2026

View reviewed changes

keith-decker added 3 commits January 23, 2026 08:53

Merge branch 'main' into HYBIM-490_evaluation-error-handling

15fa200

Merge branch 'main' into HYBIM-490_evaluation-error-handling

dc8c1c6

adityamehra approved these changes Jan 26, 2026

View reviewed changes

keith-decker merged commit 57be4dc into main Jan 26, 2026
14 checks passed

keith-decker deleted the HYBIM-490_evaluation-error-handling branch January 26, 2026 22:59

github-actions bot locked and limited conversation to collaborators Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HYBIM-490] Enhance error reporting for evaluation framework #129

[HYBIM-490] Enhance error reporting for evaluation framework #129

Uh oh!

keith-decker commented Jan 13, 2026

Uh oh!

Uh oh!

adityamehra Jan 22, 2026

Uh oh!

adityamehra Jan 22, 2026

Uh oh!

Uh oh!

adityamehra commented Jan 22, 2026

Uh oh!

keith-decker commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[HYBIM-490] Enhance error reporting for evaluation framework #129

[HYBIM-490] Enhance error reporting for evaluation framework #129

Uh oh!

Conversation

keith-decker commented Jan 13, 2026

Enhance error reporting for evaluation framework

Description

Changes

Benefits

Uh oh!

Uh oh!

adityamehra Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

adityamehra Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adityamehra commented Jan 22, 2026

Uh oh!

keith-decker commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants