This folder is a dedicated repo-style package for Problem 4: K-Means Clustering.
This repository is part of the ML Arena family, a set of 5 dedicated repositories where each repository focuses on one core ML problem.
- ML-Arena-1 - Linear Regression
- ML-Arena-2 - Logistic Regression
- ML-Arena-3 - Decision Tree
- ML-Arena-4 - K-Means Clustering (Current Repo)
- ML-Arena-5 - Neural Network
- File:
dataset.csv - Source: UCI Wholesale Customers
- Rows: 440
- Columns: 6 features
- Note: No target variable is provided
K-Means-Clustering/
|- README.md
|- CONTRIBUTING.md
|- dataset.csv
|- exploration/exploration.ipynb
|- library/training.ipynb
|- scratch/training.ipynb
|- PULL_REQUEST_TEMPLATE.md
- Exploration: Data understanding, distributions, correlation, and insights.
- Library: Baseline model with standard ML libraries.
- Scratch: First-principles implementation using NumPy.
- Optimization: Improvements on top of library or scratch baseline.
Each issue is open to multiple contributors.
- Multiple PRs can be submitted for the same issue.
- Different approaches, implementations, and optimizations are welcome.
This means even if someone has already submitted a PR for an issue, you are still encouraged to submit your own solution.
- Fork this folder as its own repo (K-Means-Clustering).
- Clone your fork and create a branch.
- Pick one issue.
- Work in only one notebook per issue based on track:
exploration/exploration.ipynblibrary/training.ipynbscratch/training.ipynb
- Run all cells top to bottom.
- Open a pull request with
Closes #<issue-number>.
See CONTRIBUTING.md for detailed rules.