This project performs an end-to-end data analysis on the Online Retail II (2010-2011) dataset. The workflow includes data cleaning, storage in a relational database (SQLite), and advanced visualization using Python.
-
Top 10 Customers by Spending Identifies high-value customers using SQL aggregation. A logarithmic scale is applied to the bar plot to better visualize differences despite the highly skewed distribution.
-
Top 10 Best-Selling Products Analyzes inventory movement by identifying the most frequently purchased items.
-
Monthly Sales Trends Tracks revenue over time for the main geographical markets (UK, Netherlands, EIRE, Germany) to identify seasonal patterns.
-
Geographical Sales Heatmap A normalized heatmap showing the percentage of monthly sales contributed by each country, allowing for a direct comparison of market performance.
-
Clone the repository:
git clone [https://github.com/your-username/SALES_PROJECT_GH.git](https://github.com/your-username/SALES_PROJECT_GH.git) cd SALES_PROJECT_GH -
Install dependencies: It is recommended to use a virtual environment. pip install -r requirements.txt
-
Database Creation: Run the first script to process the raw data and generate the SQLite database: python3 NOTEBOOKS/sales_1.py
-
Run Analysis: Execute the analysis script to generate insights and view the charts: python3 NOTEBOOKS/sales_project_analysis.py
Data Handling: Used SQLite to demonstrate the ability to manage data through relational queries within a Python workflow.
Visualization: Matplotlib and Seaborn were used for the plots. Logarithmic scaling was chosen for bar charts to handle outliers effectively.
Scalability: The separation of ETL (sales_1.py) and Analysis (sales_project_analysis.py) ensures a modular and clean code structure.