π Retail Data Wrangling & Profit Analysis using R
A complete Retail Sales Data Analysis Project developed using R Programming and the Tidyverse ecosystem. This project focuses on data cleaning, wrangling, exploratory data analysis (EDA), and profit analysis across multiple retail datasets from different regions.
π Project Overview
This project combines and analyzes retail order datasets from:
Central Region Western Region Eastern Region
The datasets contained inconsistencies in:
Column names Date formats Numeric formats Duplicate fields Missing values
Using R, the project standardizes and cleans the data to create a unified dataset for analysis and reporting.
π οΈ Technologies & Libraries Used πΉ Programming Language R Programming πΉ Libraries tidyverse dplyr tidyr ggplot2 lubridate readr stringr π Project Features β Data Wrangling Cleaned inconsistent datasets Standardized column formats Converted date formats Removed duplicate columns Parsed numeric values Combined multiple regional datasets β Exploratory Data Analysis (EDA) Sales Analysis Profit Analysis Shipping Time Analysis Regional Performance Comparison Product Category Insights β Data Transformation Created calculated fields Generated shipping duration metrics Unified schema across datasets Built analytical-ready dataset β Visualization Profit Trends Sales Comparisons Regional Distribution Analysis Product Category Charts π Project Structure Retail-Data-Wrangling-Profit-Analysis-R/ β βββ mid+project.pdf # Project Report & R Code βββ Orders_Central.csv # Central Region Dataset βββ Orders_West.csv # Western Region Dataset βββ Orders_East.txt # Eastern Region Dataset βββ README.md # Project Documentation βοΈ Data Processing Workflow π Step 1 β Import Datasets Loaded CSV and TXT datasets Imported regional retail order data π Step 2 β Data Cleaning Removed unnecessary columns Renamed inconsistent attributes Converted data types π Step 3 β Data Wrangling Standardized datasets Merged datasets into one master dataset π Step 4 β Feature Engineering
Created:
Shipping Duration Year-Based Metrics Profit Calculations π Step 5 β Analysis & Insights
Performed:
Regional Analysis
Profitability Analysis
Sales Trend Analysis
Customer Insights
π Key Analysis Areas
π° Profit Analysis
Identified high-profit regions
Compared category-level profits
Evaluated discount impact on profits
π Shipping Analysis
Calculated time-to-ship metrics
Compared regional shipping performance
π Sales Insights
Product category performance
Regional sales comparison
Customer purchasing behavior
Open:
mid+project.pdf
OR run the R scripts directly.
3οΈβ£ Execute the Code
Run the code sections step-by-step in:
RStudio Posit Cloud Jupyter with R Kernel π Skills Demonstrated πΉ Data Cleaning πΉ Data Wrangling πΉ Data Transformation πΉ Exploratory Data Analysis πΉ Statistical Thinking πΉ Data Visualization πΉ Business Insights Generation πΉ Retail Data Analytics π― Learning Outcomes
This project helped in gaining hands-on experience with:
Real-world messy datasets Retail business analytics Data preprocessing techniques Data integration across multiple sources Analytical storytelling using R π¨βπ» Developed By Sai Cholik Vempati
Masterβs in Information Systems
π GitHub Profile πΌ LinkedIn Profile
β Project Highlights
β Multi-region retail dataset integration β Advanced data wrangling using Tidyverse β Business-focused profit analysis β Shipping performance analytics β Clean and structured analytical workflow