Skip to content

BorDch/AI-agent-for-A-B-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A/B Experimentation Platform with Intelligent Agents

Modular ML Experimentation β€’ Automated Decision-Making β€’ Observability-First

License Python FastAPI n8n Postgres CI/CD Status


πŸ“Œ Overview

This repository contains a complete A/B experimentation platform powered by agents, including automated traffic allocation, real-time metric aggregation, statistical evaluation, model inference service, retraining workflow, observability, and human-readable reporting.

The system is designed as a modular, extensible, production-oriented experimentation engine that can be embedded into any product or ML workflow.

πŸ” Full low-level architecture and agent specifications are provided in the included Technical Report (PDF).
πŸ“ High-level architecture diagram is provided as a separate PDF illustration.


πŸš€ Key Features

  • Event ingestion & normalization through n8n
  • Dynamic experiment routing (A/B or multi-arm)
  • FastAPI-based ML service:
    • /predict
    • /stat_test
    • /retrain
    • /nlq (natural-language queries)
  • Intelligent Agents:
    • Experiment Agent (traffic allocation)
    • Metrics Agent (aggregation)
    • Evaluator Agent (statistical tests)
    • Trainer Agent (model retraining)
  • PostgreSQL storage with audit logs, metrics tables, and model registry
  • Observability & alerts (low-confidence predictions, pipeline issues)
  • PDF reporting for experiment summaries
  • Plug-and-play integration with external systems or applications

πŸ— High-Level Architecture

                       +--------------------+
                       |   Data Sources     |
                       | (Events, Labels)   |
                       +---------+----------+
                                 |
                                 v
                     +-----------+-------------+
                     |     n8n Orchestrator    |
                     |  - ingestion            |
                     |  - routing (A/B)        |
                     |  - triggers & alerts    |
                     +-----------+-------------+
                                 |
                                 v
      +-------------------+    FastAPI      +------------------+
      | Experiment Agent  | <-------------> |  Model Registry  |
      | - traffic control |                 |  (artifacts)     |
      +-------------------+                 +------------------+
                                 |
                                 v
                       +---------+----------+
                       |   FastAPI ML API   |
                       | /predict           |
                       | /stat_test         |
                       | /retrain           |
                       +---------+----------+
                                 |
                                 v
             +---------------------------------------+
             |              PostgreSQL                |
             | events | labels | ab_metrics | audit  |
             +---------------------------------------+
                                 |
                                 v
                       +---------+---------+
                       |  Reporting Layer  |
                       |  PDF / dashboards |
                       +---------+---------+
                                 |
                                 v
                            End Users

Scheme using Draw.io scheme

πŸ“„ Documentation Structure

This repository includes two primary documents:

  1. Technical Report (PDF)
    Full detailed architecture: agents, models, decision logic, metrics, constraints, and design rationale. look report here

  2. High-Level Architecture Diagram (PDF)
    A clean visualization for presentations and system overview.


πŸ”§ Technology Stack

Layer Tool Purpose
Orchestration n8n Event routing, experiment workflows, automation
Model Serving FastAPI Prediction, evaluation, retraining, NLQ
Storage PostgreSQL Events, labels, metrics, model registry
Backend Logic Python 3.10+ ML models, agents, data processing
CI/CD GitHub Actions Automated build & deploy
Containerization Docker Compose Local and production-ready deployments
Observability (Optional) Prometheus, Grafana, Sentry Metrics & alerts

🧠 Agents Architecture (Short Summary)

Each agent is fully described in the Technical Report.
Below is the high-level overview:

Experiment Agent

  • Applies statistical results.
  • Selects winning model.
  • Adjusts A/B traffic values in real time.
  • Writes decisions to audit log.

Metrics Agent

  • Periodically aggregates prediction events.
  • Computes recall, precision, FPR, accuracy.
  • Stores results in ab_metrics.

Evaluator Agent

  • Runs statistical tests via /stat_test.
  • Generates insights (p-value, confidence interval).
  • Creates human-readable conclusions.

Trainer Agent

  • Retrains on Kaggle or internal dataset.
  • Applies preprocessing and versioning.
  • Updates the Model Registry.

πŸ” Observability & Alerts

  • Low-confidence prediction alerts
  • Model drift signals
  • API latency & error rate monitoring
  • Experiment health checks
  • Full audit trail for decisions

πŸŽ› Configuration

Configuration is stored in PostgreSQL:

  • experiment configuration
  • traffic split ratios
  • active model version
  • metrics history
  • audit logs

This makes the system fully dynamic and adjustable without redeployment.


πŸ§ͺ Sample User Query (NLQ)

User:

β€œCompare Model A and Model B over the last 7 days and recommend a rollout strategy.”

System Response:

{
  "winner": "B",
  "p_value": 0.008,
  "effect_size": 0.12,
  "recommendation": "Increase B to 70% traffic; monitor FPR for 24h."
}

ROADMAP

Version Feature
1.0 Initial release (A/B testing, agents, metrics)
1.1 Auto-rollout strategies
1.2 PDF reports + dashboards
1.3 Multi-model (A/B/C/D) support
1.4 Canary deployments & rollback logic
2.0 Real-time drift detection
2.1 Feature store integration
3.0 Full experimentation as a service (EaaS)

🀝 Contribution Guidelines

We welcome contributions! Please follow the steps below:

1. Fork the repository
2. Create a feature branch
3. Ensure code follows our style guide
4. Add or update tests
5. Submit a pull request describing your change
6. For major changes, please open an issue beforehand.

πŸ“¬ Contact

For questions or interest in integrating the platform into your product:

⭐ Acknowledgements

This project was built as part of an engineering challenge and later evolved into a production-capable experimentation platform.

About

The project implements a specialized AI agent in the field of A/B testing, which is able to upload data,, put forward hypotheses, check statistical tests, make predictions and return the answer to the user in the most convenient form. This project can be integrated for any ML platform and can be used to solve various tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors