PhishDetector

End-to-end phishing URL detection project with:

A Flask ML API for classifying URLs.
A static web UI for quick checks.
Data prep + training scripts for model updates.

Live Demo

https://aiphishdetector.netlify.app/

What’s Inside

backend/ Flask API, model training, and saved model artifacts.
frontend/ Static single-page UI that calls the API.
data/ Data conversion and training helper scripts + datasets.
docs/ Detailed documentation and architecture diagrams.

How It Works

Training (backend/train_real.py) builds a TF‑IDF + RandomForest model from URL text.
The Flask API (backend/app_flask.py) loads model.pkl and vectorizer.pkl to score URLs.
The UI sends a URL to the API and displays the label, score, and reasons.

API

POST /analyze

{ "url": "https://example.com/login" }

Response (example):

{
  "url": "...",
  "label": "SAFE | SUSPICIOUS | MALICIOUS",
  "score": 0-100,
  "reasons": ["..."]
}

Quick Start (Backend)

Create and activate a virtual environment (recommended).
Install dependencies:
- pip install -r backend/requirements.txt
Run the API:
- python backend/app_flask.py Note: debug=True is enabled by default for development. Remember to disable it for production.
Test:
- POST http://127.0.0.1:5000/analyze

Train / Refresh the Model

There are two training flows:

Synthetic data: backend/train.py (generates fake URLs).
Real data: backend/train_real.py (uses data/openphish_norm.csv and data/benign_norm.csv).

Typical real-data flow:

Put raw files in data/:
- openphish_raw.txt
- benign_raw.csv (must include a domain column)
Run the conversion helper:
- powershell -File data/prepare_and_train.ps1
- Add -RunTraining to auto-train after conversion.
Training outputs:
- backend/model.pkl
- backend/vectorizer.pkl

Frontend

frontend/index.html is a static page that calls a backend URL configured in the script. By default it points to a hosted API (onrender.com). Update the fetch URL if you want to use your local API.

Project Structure

backend/app_flask.py — Flask API (/analyze)
backend/train_real.py — TF‑IDF + RandomForest training
backend/model.pkl / backend/vectorizer.pkl — saved model artifacts
frontend/index.html — static UI
data/prepare_and_train.ps1 — data conversion + optional training

Notes / Gotchas

backend/venv/ is checked in; you can ignore it and create your own environment.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
data		data
docs		docs
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishDetector

Live Demo

What’s Inside

How It Works

API

Quick Start (Backend)

Train / Refresh the Model

Frontend

Project Structure

Notes / Gotchas

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhishDetector

Live Demo

What’s Inside

How It Works

API

Quick Start (Backend)

Train / Refresh the Model

Frontend

Project Structure

Notes / Gotchas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages