Skip to content

imraningithub/vendor-performance-analysis-sql-python-pwerbi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🍷 Vendor Performance Analysis — Beverage & Spirits Retail

An end-to-end data analytics project analyzing vendor and product performance for a multi-store beverage and spirits retailer. The project covers the full pipeline from raw data ingestion to an interactive Power BI dashboard, with exploratory analysis, feature engineering, and statistical hypothesis testing in between.


📊 Business Problem

A beverage and spirits retailer operating across multiple stores and cities needed answers to critical procurement and profitability questions:

  • Which vendors generate the most revenue — and which consume budget without adequate return?
  • Are there hidden high-margin products being undersold due to lack of visibility?
  • Does purchasing in bulk actually reduce unit costs, and by how much?
  • Is the underperformance of low-tier vendors statistically meaningful or just noise?
  • Which vendors are tying up cash in unsold inventory?

🔑 Key Findings

Metric Result
Total Sales $452M
Total Purchases $322M
Gross Profit $129.72M
Avg Profit Margin 38.8%
Unsold Inventory $9M
Top 10 vendor purchase concentration 65.69%
Bulk order unit cost saving vs small orders −72.4% ($39.07 → $10.78)
High-margin, low-sales brands identified 198
T-test result (top vs low vendors) T = −17.67, p < 0.0001 ✅ Reject H₀

🗂️ Project Structure

vendor-performance-analysis/
│
├── data/                          # Raw CSV source files (not included — see Data section)
│   ├── begin_inventory.csv
│   ├── end_inventory.csv
│   ├── purchases.csv
│   ├── purchase_prices.csv
│   ├── sales.csv
│   └── vendor_invoice.csv
│
├── Ingestion.ipynb                # Step 1: Load CSVs into DuckDB
├── Exploretey_Data_Analysis.ipynb # Step 2: EDA + feature engineering + vendor_sales_summary
├── Vendor_Performance_Analysis.ipynb  # Step 3: Analysis, visualizations & statistical tests
│
├── vendor_sales_summary_final.xls # Exported summary table (Power BI source)
│
├── dashboard/
│   └── Vendor_Performance_Dashboard.png  # Power BI dashboard screenshot
│
├── report/
│   └── Vendor_Performance_Report.docx    # Full professional analysis report
│
└── README.md

🔄 Pipeline Overview

Raw CSVs (6 files, ~15.6M rows)
        │
        ▼
[ Ingestion.ipynb ]
  → DuckDB (inventory.db)
  → 6 tables created in 30 seconds
        │
        ▼
[ Exploretey_Data_Analysis.ipynb ]
  → Multi-CTE SQL joins all 6 tables
  → Computes: GrossProfit, ProfitMargin, StockTurnover, SalesToPurchaseRatio
  → Cleans & exports: vendor_sales_summary (10,692 rows → 8,564 valid)
        │
        ▼
[ Vendor_Performance_Analysis.ipynb ]
  → EDA: distributions, correlation heatmap, summary statistics
  → Vendor ranking & purchase concentration analysis
  → Bulk purchasing analysis
  → High-margin / low-sales brand discovery (198 brands)
  → Welch's T-test: top vs low vendor profit margins
        │
        ▼
[ Power BI Dashboard ]
  → KPI cards, donut chart, bar charts, scatter plot, low performer highlight

📁 Data Sources

The raw data consists of 6 CSV files representing one full year (Jan 1 – Dec 31, 2024) of retail operations:

File Rows Description
begin_inventory.csv 206,529 Opening stock by store, brand, city
end_inventory.csv 224,489 Closing stock at year end
purchases.csv 2,372,474 All purchase orders with vendor & pricing details
purchase_prices.csv 12,261 Reference: brand-level pricing per vendor
sales.csv 12,825,363 Daily sales transactions including excise tax
vendor_invoice.csv 5,543 Freight/logistics cost per vendor

⚠️ Raw data files are not included in this repository due to size. Place your CSVs in a /data folder before running the notebooks.


⚙️ Engineered Features (vendor_sales_summary)

The core analytical table is built via a multi-CTE SQL query and enriched with these KPIs:

Feature Formula
GrossProfit Total_sales_dollars − TotalPurchaseDollars
ProfitMargin (GrossProfit / Total_sales_dollars) × 100
StockTurnover Total_sales_quantity / TotalPurchaseQuantity
SalesToPurchaseRatio Total_sales_dollars / TotalPurchaseDollars

📈 Visualizations

Chart Insight
Distribution Histograms Most metrics are right-skewed — a few top performers dominate
Correlation Heatmap PurchasePrice vs ActualPrice: 0.99 | GrossProfit vs TotalPurchaseDollars: 0.96
Scatter Plot (Sales vs Margin) 198 red-dot brands identified in the high-margin, low-sales quadrant
Donut Chart DIAGEO leads at 16.3%; top 10 total = 65.69% of all purchases
CI Histogram Top vendors (mean 31.18%) vs Low vendors (mean 41.57%) — clearly distinct distributions
Power BI Dashboard Interactive: KPI cards, vendor ranking, low performer bar chart, scatter by target brand

🧪 Statistical Testing

A two-sample Welch's T-test was conducted to validate whether the profit margin difference between top-performing and low-performing vendors is statistically significant.

  • H₀: No significant difference in profit margins between the two groups
  • H₁: A significant difference exists
Top Vendors (top 25% by sales) Low Vendors (bottom 25% by sales)
Mean Profit Margin 31.18% 41.57%
95% Confidence Interval (30.74%, 31.61%) (40.50%, 42.64%)

Result: T = −17.6695, p = 0.0000 → Reject H₀

The two vendor groups operate in fundamentally different profitability regimes. This statistically validates using performance tiering as the basis for vendor management decisions.


📌 Key Recommendations

  1. Diversify vendor portfolio — reduce 65.69% top-10 concentration to below 55% within 12 months
  2. Consolidate orders — shift volume to large-order tiers for 72% unit cost savings
  3. Promote 198 hidden-gem brands — high-margin products with very low current sales exposure
  4. Issue improvement plans to 5 vendors with stock turnover < 1.0 (TRINCHERO, MAJESTIC FINE WINES, CRUSH WINES, LAIRD & CO, SEA BREEZE CELLARS)
  5. Resolve $9M unsold inventory — urgent SKU-level audit across stores
  6. Automate the pipeline — monthly DuckDB refresh + Power BI Service for real-time procurement decisions

🛠️ Tech Stack

Layer Tool
Database DuckDB
Data Wrangling Python, Pandas, SQL (CTEs)
Statistical Testing SciPy (ttest_ind)
Visualization Matplotlib, Seaborn
Dashboard Microsoft Power BI
Environment Jupyter Notebook, Conda

🚀 How to Run

1. Clone the repository

git clone https://github.com/your-username/vendor-performance-analysis.git
cd vendor-performance-analysis

2. Install dependencies

pip install duckdb pandas numpy matplotlib seaborn scipy openpyxl

3. Add your data

Place the 6 CSV files inside a /data folder in the project root.

4. Run notebooks in order

# Step 1 — Ingest CSVs into DuckDB
jupyter notebook Ingestion.ipynb

# Step 2 — EDA + build vendor_sales_summary
jupyter notebook Exploretey_Data_Analysis.ipynb

# Step 3 — Performance analysis + statistical tests
jupyter notebook Vendor_Performance_Analysis.ipynb

5. Open the dashboard

Import vendor_sales_summary_final.xls into Power BI Desktop to explore the dashboard.


📄 Report

A full written analysis report is available in /report/Vendor_Performance_Report.docx, covering:

  • Business problem definition
  • Data architecture & pipeline
  • EDA findings with all embedded charts
  • Vendor performance tables
  • Statistical test results
  • 6 prioritized recommendations

👤 Author

Imran Khan Student — Digital Technology and Management (DTM) Ostbayerisch Technisch Hochschule Amberg-Weiden Email: iamkhan7694@gmail.com Linkdin: (https://www.linkedin.com/in/imran-khan-7296681b5/)


📃 License

This project is for educational and portfolio purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors