╔══════════════════════════════════════════════════════════════╗ ║ "He had 3 hours of access left. ║ ║ He used every single minute." ║ ╚══════════════════════════════════════════════════════════════╝
⏱ Duration: 3 Hours | 👥 Team Size: 3 Members | 🏆 Max Score: 600 Points
Aryan Kumar built NexaLearn from the ground up.
He was the lead ML engineer — the one who designed the prediction pipeline, wrote the RAG chatbot, set up the FastAPI backend, and shipped the Streamlit dashboard. For two years, he poured everything into it.
Three hours before the official global launch of our AI system, he went rogue after being let go.
HR gave him a brief window to "wrap things up." A courtesy, they called it. Aryan called it something else entirely.
What followed were silent, surgical changes. Some malicious. Some just work he deliberately left incomplete. He knew the codebase better than anyone — well enough to know exactly where to twist the knife without drawing blood.
On the surface, everything looks polished. The UI loads. The API starts. The pipeline runs. No stack traces. No red text.
Nothing screams broken.
But the models train on the wrong target. The chatbot always returns an empty string. The colour scale is inverted. The JWT secret is empty. The database silently rolls back every insert.
He didn't break anything obvious.
He broke everything subtly.
Your team has been called in as emergency contractors. The global launch is in 3 hours. Find what Aryan left behind.
┌────────────────────────────────────────────────────────────┐
│ NexaLearn AI Stack │
├─────────────┬────────────────┬──────────────┬─────────────┤
│ ML Track │ GenAI Track │ API Track │ UI Track │
│ │ │ │ │
│ml_pipeline │ chatbot.py │ api_main.py │ app.py │
│ .py │ │ │ │
│ scikit-learn│ LangChain v1 │ FastAPI + │ Streamlit │
│ + pandas │ Groq LLaMA3 │ SQLite +JWT │ │
│ │ FAISS + HF │ │ │
└─────────────┴────────────────┴──────────────┴─────────────┘
↑
config.py — bugs here
propagate into every file
| File | Role |
|---|---|
config.py |
Central config — bugs propagate everywhere |
requirements.txt |
Dependencies — missing packages |
ml_pipeline.py |
ML track — data cleaning, training, evaluation |
chatbot.py |
GenAI track — LangChain v1-pattern RAG chatbot |
api_main.py |
API track — FastAPI REST backend + auth + DB |
app.py |
UI track — Streamlit dashboard |
Numerous bugs hidden in plain sight.
- Python 3.11+
- A free Groq API key → console.groq.com
- Fork the repository on GitHub: https://github.com/ayanokojix21/Broken-AI
- Clone your fork to your local machine:
git clone https://github.com/<your-username>/Broken-AI
cd Broken-AI- Create a virtual environment
# On Windows:
python -m venv venv
# On macOS/Linux:
python3 -m venv venv- Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate- Set up environment variables
# Copy the example file (Windows: copy .env.example .env)
cp .env.example .env
# Open .env in your editor and add your Groq API Key!- Install dependencies
pip install -r requirements.txtThe ML pipeline needs to generate the model files before the API can start correctly.
# Terminal 1 — ML Pipeline (Run this FIRST)
python ml_pipeline.py
# Terminal 2 — Backend API
uvicorn api_main:app --host 0.0.0.0 --port 8000 --reload
# Terminal 3 — Streamlit Dashboard
streamlit run app.pyOpen: http://localhost:8501
The machine learning pipeline runs from start to finish without crashing. However, the resulting model performs poorly in ways that suggest fundamental flaws in the data science process. This track tests deep scikit-learn fluency and your ability to detect data mishandling, improper scoring, and logical errors in chart generation.
Built on LangChain v1-style runnable composition with FAISS retrieval and per-session history wrapping. It compiles and responds. But under the hood, the entire RAG pipeline is malfunctioning. The AI seems disconnected from the data, behaving like a silent ghost.
FastAPI backend with JWT auth, SQLite persistence, and ML model serving. The endpoints spin up successfully, but the system is riddled with security vulnerabilities, bad REST practices, and failing middleware. Your job is to secure the database and ensure the API behaves correctly.
The dashboard renders. Charts load. The form submits. However, the interactions feel strange. User feedback is contradictory, network requests fail silently, and the visualizations don't match the metrics they claim to represent.
Your fixes will be evaluated dynamically and statically by an automated Python-based judging script. Points are awarded based on the severity and complexity of the bugs you resolve.
Maximum Score: 600 Points
Bonus Points: Submitting a valid deployed Streamlit app URL can earn you extra points.
- Start at
config.py— bugs here may cascade into other files. Fix config first. - A function that doesn't crash is not necessarily correct. Look at what it returns.
- The hardest bugs look intentional — off-by-one thresholds, reversed sort orders, wrong target columns.
- Security bugs rarely raise exceptions. They silently succeed when they should fail.
- Data leakage doesn't crash. It just makes your evaluation metrics completely untrustworthy.
The automated judge will analyze your code directly.
When your 3 hours are up:
- Ensure all your fixes are pushed to your GitHub repository before the deadline.
- Submit your final GitHub repository URL via the provided Google Form.
- All 3 team members must contribute (Git commit history will be checked).
- You don't need to write any bug reports — just fix the code.
- No access to external AI tools during the challenge.
- The automated judge script's evaluation of your code is final.
Good luck. Aryan was thorough.
🎓 Broken AI | Tech Sprint Debugging Challenge