End-to-end phishing URL detection project with:
- A Flask ML API for classifying URLs.
- A static web UI for quick checks.
- Data prep + training scripts for model updates.
https://aiphishdetector.netlify.app/
backend/Flask API, model training, and saved model artifacts.frontend/Static single-page UI that calls the API.data/Data conversion and training helper scripts + datasets.docs/Detailed documentation and architecture diagrams.
- Training (
backend/train_real.py) builds a TF‑IDF + RandomForest model from URL text. - The Flask API (
backend/app_flask.py) loadsmodel.pklandvectorizer.pklto score URLs. - The UI sends a URL to the API and displays the label, score, and reasons.
POST /analyze
{ "url": "https://example.com/login" }Response (example):
{
"url": "...",
"label": "SAFE | SUSPICIOUS | MALICIOUS",
"score": 0-100,
"reasons": ["..."]
}- Create and activate a virtual environment (recommended).
- Install dependencies:
pip install -r backend/requirements.txt
- Run the API:
python backend/app_flask.pyNote:debug=Trueis enabled by default for development. Remember to disable it for production.
- Test:
POST http://127.0.0.1:5000/analyze
There are two training flows:
- Synthetic data:
backend/train.py(generates fake URLs). - Real data:
backend/train_real.py(usesdata/openphish_norm.csvanddata/benign_norm.csv).
Typical real-data flow:
- Put raw files in
data/:openphish_raw.txtbenign_raw.csv(must include adomaincolumn)
- Run the conversion helper:
powershell -File data/prepare_and_train.ps1- Add
-RunTrainingto auto-train after conversion.
- Training outputs:
backend/model.pklbackend/vectorizer.pkl
frontend/index.html is a static page that calls a backend URL configured in the script.
By default it points to a hosted API (onrender.com). Update the fetch URL if you want to use your local API.
backend/app_flask.py— Flask API (/analyze)backend/train_real.py— TF‑IDF + RandomForest trainingbackend/model.pkl/backend/vectorizer.pkl— saved model artifactsfrontend/index.html— static UIdata/prepare_and_train.ps1— data conversion + optional training
backend/venv/is checked in; you can ignore it and create your own environment.