"Transforming churn risk into retention strategies with advanced ML."
🔍 Powered by: XGBoost, Random Forest, Optuna, SHAP
🧑💻 Engineered by: vishnupriyanpr
PrediChurn is a robust, modular machine learning pipeline for customer churn prediction. Designed for telecom, SaaS, and banking datasets, it automates data wrangling, business-driven feature engineering, model selection, evaluation, and produces clear business insights and analytics dashboards. Its outputs guide retention teams toward targeted, ROI-driven customer strategies.
- 🔄 Multi-model engine: Logistic Regression, Random Forest, XGBoost—all Optuna-optimized
- 🛠️ Feature engineering: Tenure, ARPU, contract/payment, and behavior features with full NaN/infinite safety
- 🔍 Explainable AI: SHAP for both global and local churn driver visualization
- 📊 Business metrics: Churn rate, “revenue at risk”, “potential revenue saved”, intervention ROI
- 📑 Automated reporting: Executive summaries, actionable recommendations, and visualization outputs
1. Data Preparation
- Loads raw CSV data
- Cleans missing values and outliers
- Encodes categoricals
- Scales numerical data
2. Feature Engineering
- Generates >10 additional business-focused features (e.g., avg_charges_per_tenure, high_value_customer)
- Handles division-by-zero/NaN/infinite edge cases
3. Modeling and Optimization
- Trains Logistic Regression, Random Forest, and XGBoost models
- Balances training data with SMOTE for rare churn events
- Hyperparameter tuning via Optuna for best ROC-AUC
4. Evaluation
- Measures: accuracy, precision, recall, ROC-AUC
- Generates confusion matrix, ROC, Precision-Recall plots
5. Explainability
- Computes and saves SHAP summary and bar plots
- Ranks top churn features both globally and per-customer
6. Business Analytics
- Calculates "revenue at risk", "potential savings", intervention efficiency
- Generates markdown and visual HTML reports
- Top churn drivers and segment-wise actionable steps
-
Clone Project & Install
git clone https://github.com/vishnupriyanpr/churnguard-ai.git cd churnguard-ai pip install -r requirements.txt -
Prepare Dataset
- Place your CSV data in
data/raw/telco_churn.csv(Kaggle Telco Churn format recommended)
- Place your CSV data in
-
Run Pipeline
python main.py -
View Outputs
- Metrics, SHAP PNGs, and business report: in
reports/ - Model artifacts: in
models/
- Metrics, SHAP PNGs, and business report: in
erDiagram
RAW_DATA {
string customerID
string features
string churn_label
}
PROCESSED_DATA {
string encoded_features
string target
}
ENGINEERED_DATA {
string new_features
}
TRAIN_DATA {
string balanced_features
string balanced_target
}
MODEL {
string model_type
string hyperparameters
string trained_weights
}
METRICS {
float accuracy
float precision
float recall
float roc_auc
}
SHAP_PLOTS {
string summary_plot
string feature_importance
}
BUSINESS_REPORT {
string revenue_at_risk
string recommendations
string top_drivers
}
RAW_DATA ||--o{ PROCESSED_DATA : cleaned_and_preprocessed
PROCESSED_DATA ||--o{ ENGINEERED_DATA : feature_engineered
ENGINEERED_DATA ||--o{ TRAIN_DATA : balanced_with_SMOTE
TRAIN_DATA ||--o{ MODEL : trained_to
MODEL ||--o{ METRICS : generates
MODEL ||--o{ SHAP_PLOTS : explains
METRICS ||--o{ BUSINESS_REPORT : summarized_in
SHAP_PLOTS ||--o{ BUSINESS_REPORT : visualized_in
- Accuracy: 78.1%
- Precision: 57.9%
- Recall: 65.0%
- ROC-AUC: 0.822
- Churn Rate: 26.5%
- Revenue at Risk: $374,000
- Potential Revenue Saved: $72,900
- Intervention Efficiency: 57.2%
- Top Churn Drivers:
- avg_charges_per_tenure (0.132)
- MonthlyCharges (0.083)
- charges_trend (0.076)
- TotalCharges (0.076)
- price_per_month_ratio (0.075)
- Immediate Action: Target high-risk (churn prob > 70%) with retention offers
- Monitor Medium-Risk: Engage the 30–70% churn probability group
- Feature Focus: Optimize avg_charges_per_tenure and related drivers
- Ongoing Scoring: Recompute churn risk monthly for all customers
churnguard-ai/
├── data/
│ ├── raw/
│ └── processed/
├── models/
├── reports/
├── src/
│ ├── data_loader.py
│ ├── data_preprocessor.py
│ ├── feature_engineer.py
│ ├── model_trainer.py
│ ├── model_evaluator.py
│ └── utils.py
├── main.py
├── requirements.txt
└── README.md
MIT License — use, modify, and scale freely!


