Modular ML Experimentation β’ Automated Decision-Making β’ Observability-First
This repository contains a complete A/B experimentation platform powered by agents, including automated traffic allocation, real-time metric aggregation, statistical evaluation, model inference service, retraining workflow, observability, and human-readable reporting.
The system is designed as a modular, extensible, production-oriented experimentation engine that can be embedded into any product or ML workflow.
π Full low-level architecture and agent specifications are provided in the included Technical Report (PDF).
π High-level architecture diagram is provided as a separate PDF illustration.
- Event ingestion & normalization through n8n
- Dynamic experiment routing (A/B or multi-arm)
- FastAPI-based ML service:
- /predict
- /stat_test
- /retrain
- /nlq (natural-language queries)
- Intelligent Agents:
- Experiment Agent (traffic allocation)
- Metrics Agent (aggregation)
- Evaluator Agent (statistical tests)
- Trainer Agent (model retraining)
- PostgreSQL storage with audit logs, metrics tables, and model registry
- Observability & alerts (low-confidence predictions, pipeline issues)
- PDF reporting for experiment summaries
- Plug-and-play integration with external systems or applications
+--------------------+
| Data Sources |
| (Events, Labels) |
+---------+----------+
|
v
+-----------+-------------+
| n8n Orchestrator |
| - ingestion |
| - routing (A/B) |
| - triggers & alerts |
+-----------+-------------+
|
v
+-------------------+ FastAPI +------------------+
| Experiment Agent | <-------------> | Model Registry |
| - traffic control | | (artifacts) |
+-------------------+ +------------------+
|
v
+---------+----------+
| FastAPI ML API |
| /predict |
| /stat_test |
| /retrain |
+---------+----------+
|
v
+---------------------------------------+
| PostgreSQL |
| events | labels | ab_metrics | audit |
+---------------------------------------+
|
v
+---------+---------+
| Reporting Layer |
| PDF / dashboards |
+---------+---------+
|
v
End Users
This repository includes two primary documents:
-
Technical Report (PDF)
Full detailed architecture: agents, models, decision logic, metrics, constraints, and design rationale. look report here -
High-Level Architecture Diagram (PDF)
A clean visualization for presentations and system overview.
| Layer | Tool | Purpose |
|---|---|---|
| Orchestration | n8n | Event routing, experiment workflows, automation |
| Model Serving | FastAPI | Prediction, evaluation, retraining, NLQ |
| Storage | PostgreSQL | Events, labels, metrics, model registry |
| Backend Logic | Python 3.10+ | ML models, agents, data processing |
| CI/CD | GitHub Actions | Automated build & deploy |
| Containerization | Docker Compose | Local and production-ready deployments |
| Observability | (Optional) Prometheus, Grafana, Sentry | Metrics & alerts |
Each agent is fully described in the Technical Report.
Below is the high-level overview:
- Applies statistical results.
- Selects winning model.
- Adjusts A/B traffic values in real time.
- Writes decisions to audit log.
- Periodically aggregates prediction events.
- Computes recall, precision, FPR, accuracy.
- Stores results in ab_metrics.
- Runs statistical tests via /stat_test.
- Generates insights (p-value, confidence interval).
- Creates human-readable conclusions.
- Retrains on Kaggle or internal dataset.
- Applies preprocessing and versioning.
- Updates the Model Registry.
- Low-confidence prediction alerts
- Model drift signals
- API latency & error rate monitoring
- Experiment health checks
- Full audit trail for decisions
Configuration is stored in PostgreSQL:
- experiment configuration
- traffic split ratios
- active model version
- metrics history
- audit logs
This makes the system fully dynamic and adjustable without redeployment.
User:
βCompare Model A and Model B over the last 7 days and recommend a rollout strategy.β
System Response:
{
"winner": "B",
"p_value": 0.008,
"effect_size": 0.12,
"recommendation": "Increase B to 70% traffic; monitor FPR for 24h."
}| Version | Feature |
|---|---|
| 1.0 | Initial release (A/B testing, agents, metrics) |
| 1.1 | Auto-rollout strategies |
| 1.2 | PDF reports + dashboards |
| 1.3 | Multi-model (A/B/C/D) support |
| 1.4 | Canary deployments & rollback logic |
| 2.0 | Real-time drift detection |
| 2.1 | Feature store integration |
| 3.0 | Full experimentation as a service (EaaS) |
We welcome contributions! Please follow the steps below:
1. Fork the repository
2. Create a feature branch
3. Ensure code follows our style guide
4. Add or update tests
5. Submit a pull request describing your change
6. For major changes, please open an issue beforehand.For questions or interest in integrating the platform into your product:
- π§ [email protected]
- π https://github.com/BorDch
This project was built as part of an engineering challenge and later evolved into a production-capable experimentation platform.
