A comprehensive machine learning project that predicts house prices using multiple regression algorithms. This project demonstrates data preprocessing, feature engineering, model training, and evaluation techniques.
This price prediction system analyzes historical housing data to forecast property values. The project implements and compares multiple regression models including Linear Regression, Random Forest, and Gradient Boosting to achieve optimal prediction accuracy.
- Python 3.8+
- Scikit-Learn
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Jupyter Notebook
price-predictor-ml/
│
├── data/
│ └── housing_data.csv
├── notebooks/
│ └── price_prediction_analysis.ipynb
├── src/
│ ├── data_preprocessing.py
│ ├── feature_engineering.py
│ ├── model_training.py
│ └── model_evaluation.py
├── models/
│ └── best_price_model.pkl
├── requirements.txt
├── README.md
└── .gitignore
- Data Preprocessing: Handles missing values, outliers, and data cleaning
- Feature Engineering: Creates new features and scales existing ones
- Multiple Models: Implements Linear Regression, Random Forest, and Gradient Boosting
- Model Comparison: Evaluates and compares model performance
- Hyperparameter Tuning: Optimizes model parameters using GridSearchCV
- Visualization: Provides comprehensive data analysis and results visualization
The project uses a synthetic housing dataset with the following features:
- Size (square feet)
- Bedrooms
- Bathrooms
- Location Score (1-10)
- Age (years)
- Garage (0/1)
- Price (target variable)
| Model | MAE | RMSE | R² Score |
|---|---|---|---|
| Linear Regression | 15,234 | 21,456 | 0.82 |
| Random Forest | 12,890 | 18,234 | 0.87 |
| Gradient Boosting | 11,567 | 17,123 | 0.89 |
- Clone the repository:
git clone https://github.com/username/price-predictor-ml.git
cd price-predictor-ml- Install dependencies:
pip install -r requirements.txtpython src/model_training.pyjupyter notebook notebooks/price_prediction_analysis.ipynbfrom src.model_training import load_model, predict_price
model = load_model('models/best_price_model.pkl')
price = predict_price(model, size=2000, bedrooms=3, bathrooms=2,
location_score=8.5, age=5, garage=1)
print(f"Predicted Price: ${price:,.2f}")The Gradient Boosting model achieved the best performance with:
- Mean Absolute Error: $11,567
- R² Score: 0.89
- Root Mean Square Error: $17,123
- Fork the repository
- Create a feature branch
- Make changes
- Submit a pull request
This project is licensed under the MIT License.