-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
area/testingTesting: unit, integration, e2e, performance, etcTesting: unit, integration, e2e, performance, etc
Description
Summary
Follow-up to #586 (Add automated integration tests using mcpchecker/gevals).
The gevals integration tests currently run as a separate workflow (.github/workflows/gevals.yaml) with their own Kind cluster setup. This issue tracks integrating these tests into our existing e2e test infrastructure for better resource efficiency and consistency.
Goals
- Integrate into e2e test suite - Reuse the existing e2e test environment rather than spinning up a separate Kind cluster
- Test isolation - Ensure gevals tests don't interfere with existing e2e tests (setup/teardown, environment state)
- On-demand triggering - Add support for running gevals tests via PR comment (building on claude generated e2e on demand workflow #527's
/test-e2einfrastructure) - Release process documentation - Document manual execution of full test suite including gevals tests
Tasks
1. E2E Integration
- Refactor gevals tests to run against the existing e2e Kind cluster setup
- Ensure test isolation:
- Avoid modifying shared test resources (ConfigMaps, Secrets, MCPServerRegistrations)
- Use dedicated namespaces or unique resource names if needed
- Clean up any resources created during gevals tests
- Consider whether gevals tests should run as part of
happyorfullsuite, or as a separategevalssuite
2. On-Demand Workflow Updates
Building on #527's /test-e2e happy|full command:
- Add
/test-e2e gevalsas a new suite option (or integrate intofull) - Update
.github/workflows/e2e-on-demand.yamlto support the new suite - Document when to use gevals tests on PRs (e.g., changes to broker, router, tool routing)
3. Release Process Documentation
- Create or update
RELEASE.mdwith pre-release testing checklist - Include instructions for manually running:
- Standard e2e tests (
make test-e2e) - Gevals integration tests
- Conformance tests
- Standard e2e tests (
- Document any required environment setup (LLM API keys, Ollama fallback)
Considerations
Test Environment Isolation
The gevals tests use an LLM agent that interacts with the gateway. Key isolation concerns:
- Tool state: Gevals tests invoke tools - ensure they don't conflict with other test servers
- Server registration: If gevals tests register additional MCP servers, ensure cleanup
- Authentication state: Verify credentials used don't interfere with other tests
LLM Dependencies
The gevals workflow has fallback to local Ollama if paid LLM secrets aren't available. For e2e integration:
- Consider whether Ollama setup should be part of standard e2e setup
- Document the model caching behavior for local development
- Ensure CI runners have sufficient resources for Ollama if used
Related
- Follow-up to Add automated integration tests using mcpchecker (gevals) #586 (gevals PR)
- Builds on claude generated e2e on demand workflow #527 (on-demand e2e workflow)
- Related to Add integration tests, possibly using gevals #452 (original integration tests issue)
References
- evals/README.md - Gevals configuration and usage
- gevals documentation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/testingTesting: unit, integration, e2e, performance, etcTesting: unit, integration, e2e, performance, etc
Type
Projects
Status
Backlog