🍷 Vendor Performance Analysis — Beverage & Spirits Retail

An end-to-end data analytics project analyzing vendor and product performance for a multi-store beverage and spirits retailer. The project covers the full pipeline from raw data ingestion to an interactive Power BI dashboard, with exploratory analysis, feature engineering, and statistical hypothesis testing in between.

📊 Business Problem

A beverage and spirits retailer operating across multiple stores and cities needed answers to critical procurement and profitability questions:

Which vendors generate the most revenue — and which consume budget without adequate return?
Are there hidden high-margin products being undersold due to lack of visibility?
Does purchasing in bulk actually reduce unit costs, and by how much?
Is the underperformance of low-tier vendors statistically meaningful or just noise?
Which vendors are tying up cash in unsold inventory?

🔑 Key Findings

Metric	Result
Total Sales	$452M
Total Purchases	$322M
Gross Profit	$129.72M
Avg Profit Margin	38.8%
Unsold Inventory	$9M
Top 10 vendor purchase concentration	65.69%
Bulk order unit cost saving vs small orders	−72.4% ($39.07 → $10.78)
High-margin, low-sales brands identified	198
T-test result (top vs low vendors)	T = −17.67, p < 0.0001 ✅ Reject H₀

🗂️ Project Structure

vendor-performance-analysis/
│
├── data/                          # Raw CSV source files (not included — see Data section)
│   ├── begin_inventory.csv
│   ├── end_inventory.csv
│   ├── purchases.csv
│   ├── purchase_prices.csv
│   ├── sales.csv
│   └── vendor_invoice.csv
│
├── Ingestion.ipynb                # Step 1: Load CSVs into DuckDB
├── Exploretey_Data_Analysis.ipynb # Step 2: EDA + feature engineering + vendor_sales_summary
├── Vendor_Performance_Analysis.ipynb  # Step 3: Analysis, visualizations & statistical tests
│
├── vendor_sales_summary_final.xls # Exported summary table (Power BI source)
│
├── dashboard/
│   └── Vendor_Performance_Dashboard.png  # Power BI dashboard screenshot
│
├── report/
│   └── Vendor_Performance_Report.docx    # Full professional analysis report
│
└── README.md

🔄 Pipeline Overview

Raw CSVs (6 files, ~15.6M rows)
        │
        ▼
[ Ingestion.ipynb ]
  → DuckDB (inventory.db)
  → 6 tables created in 30 seconds
        │
        ▼
[ Exploretey_Data_Analysis.ipynb ]
  → Multi-CTE SQL joins all 6 tables
  → Computes: GrossProfit, ProfitMargin, StockTurnover, SalesToPurchaseRatio
  → Cleans & exports: vendor_sales_summary (10,692 rows → 8,564 valid)
        │
        ▼
[ Vendor_Performance_Analysis.ipynb ]
  → EDA: distributions, correlation heatmap, summary statistics
  → Vendor ranking & purchase concentration analysis
  → Bulk purchasing analysis
  → High-margin / low-sales brand discovery (198 brands)
  → Welch's T-test: top vs low vendor profit margins
        │
        ▼
[ Power BI Dashboard ]
  → KPI cards, donut chart, bar charts, scatter plot, low performer highlight

📁 Data Sources

The raw data consists of 6 CSV files representing one full year (Jan 1 – Dec 31, 2024) of retail operations:

File	Rows	Description
`begin_inventory.csv`	206,529	Opening stock by store, brand, city
`end_inventory.csv`	224,489	Closing stock at year end
`purchases.csv`	2,372,474	All purchase orders with vendor & pricing details
`purchase_prices.csv`	12,261	Reference: brand-level pricing per vendor
`sales.csv`	12,825,363	Daily sales transactions including excise tax
`vendor_invoice.csv`	5,543	Freight/logistics cost per vendor

⚠️ Raw data files are not included in this repository due to size. Place your CSVs in a /data folder before running the notebooks.

⚙️ Engineered Features (vendor_sales_summary)

The core analytical table is built via a multi-CTE SQL query and enriched with these KPIs:

Feature	Formula
`GrossProfit`	`Total_sales_dollars − TotalPurchaseDollars`
`ProfitMargin`	`(GrossProfit / Total_sales_dollars) × 100`
`StockTurnover`	`Total_sales_quantity / TotalPurchaseQuantity`
`SalesToPurchaseRatio`	`Total_sales_dollars / TotalPurchaseDollars`

📈 Visualizations

Chart	Insight
Distribution Histograms	Most metrics are right-skewed — a few top performers dominate
Correlation Heatmap	PurchasePrice vs ActualPrice: 0.99 \| GrossProfit vs TotalPurchaseDollars: 0.96
Scatter Plot (Sales vs Margin)	198 red-dot brands identified in the high-margin, low-sales quadrant
Donut Chart	DIAGEO leads at 16.3%; top 10 total = 65.69% of all purchases
CI Histogram	Top vendors (mean 31.18%) vs Low vendors (mean 41.57%) — clearly distinct distributions
Power BI Dashboard	Interactive: KPI cards, vendor ranking, low performer bar chart, scatter by target brand

🧪 Statistical Testing

A two-sample Welch's T-test was conducted to validate whether the profit margin difference between top-performing and low-performing vendors is statistically significant.

H₀: No significant difference in profit margins between the two groups
H₁: A significant difference exists

	Top Vendors (top 25% by sales)	Low Vendors (bottom 25% by sales)
Mean Profit Margin	31.18%	41.57%
95% Confidence Interval	(30.74%, 31.61%)	(40.50%, 42.64%)

Result: T = −17.6695, p = 0.0000 → Reject H₀

The two vendor groups operate in fundamentally different profitability regimes. This statistically validates using performance tiering as the basis for vendor management decisions.

📌 Key Recommendations

Diversify vendor portfolio — reduce 65.69% top-10 concentration to below 55% within 12 months
Consolidate orders — shift volume to large-order tiers for 72% unit cost savings
Promote 198 hidden-gem brands — high-margin products with very low current sales exposure
Issue improvement plans to 5 vendors with stock turnover < 1.0 (TRINCHERO, MAJESTIC FINE WINES, CRUSH WINES, LAIRD & CO, SEA BREEZE CELLARS)
Resolve $9M unsold inventory — urgent SKU-level audit across stores
Automate the pipeline — monthly DuckDB refresh + Power BI Service for real-time procurement decisions

🛠️ Tech Stack

Layer	Tool
Database	DuckDB
Data Wrangling	Python, Pandas, SQL (CTEs)
Statistical Testing	SciPy (`ttest_ind`)
Visualization	Matplotlib, Seaborn
Dashboard	Microsoft Power BI
Environment	Jupyter Notebook, Conda

🚀 How to Run

1. Clone the repository

git clone https://github.com/your-username/vendor-performance-analysis.git
cd vendor-performance-analysis

2. Install dependencies

pip install duckdb pandas numpy matplotlib seaborn scipy openpyxl

3. Add your data

Place the 6 CSV files inside a /data folder in the project root.

4. Run notebooks in order

# Step 1 — Ingest CSVs into DuckDB
jupyter notebook Ingestion.ipynb

# Step 2 — EDA + build vendor_sales_summary
jupyter notebook Exploretey_Data_Analysis.ipynb

# Step 3 — Performance analysis + statistical tests
jupyter notebook Vendor_Performance_Analysis.ipynb

5. Open the dashboard

Import vendor_sales_summary_final.xls into Power BI Desktop to explore the dashboard.

📄 Report

A full written analysis report is available in /report/Vendor_Performance_Report.docx, covering:

Business problem definition
Data architecture & pipeline
EDA findings with all embedded charts
Vendor performance tables
Statistical test results
6 prioritized recommendations

👤 Author

Imran Khan Student — Digital Technology and Management (DTM) Ostbayerisch Technisch Hochschule Amberg-Weiden Email: iamkhan7694@gmail.com Linkdin: (https://www.linkedin.com/in/imran-khan-7296681b5/)

📃 License

This project is for educational and portfolio purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dashboard		Dashboard
Notebook		Notebook
images		images
.gitignore		.gitignore
README.md		README.md
Vendor_Performance_Report_finalpdf.pdf		Vendor_Performance_Report_finalpdf.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍷 Vendor Performance Analysis — Beverage & Spirits Retail

📊 Business Problem

🔑 Key Findings

🗂️ Project Structure

🔄 Pipeline Overview

📁 Data Sources

⚙️ Engineered Features (vendor_sales_summary)

📈 Visualizations

🧪 Statistical Testing

📌 Key Recommendations

🛠️ Tech Stack

🚀 How to Run

📄 Report

👤 Author

📃 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍷 Vendor Performance Analysis — Beverage & Spirits Retail

📊 Business Problem

🔑 Key Findings

🗂️ Project Structure

🔄 Pipeline Overview

📁 Data Sources

⚙️ Engineered Features (vendor_sales_summary)

📈 Visualizations

🧪 Statistical Testing

📌 Key Recommendations

🛠️ Tech Stack

🚀 How to Run

📄 Report

👤 Author

📃 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages