
Spotify Azure Data Engineering Pipeline
Medallion Architecture + Power BI Analytics
Overview
Built an end-to-end Spotify data pipeline using PySpark on Azure Databricks. It follows a Bronze – Silver – Gold (Medallion) architecture: raw ingestion, cleaned Delta tables, and analytics-ready models. The final datasets power an interactive Power BI dashboard for insights on tracks, artists, and listening trends.
What I Built
- Ingested Spotify data into the Bronze layer (raw storage)
- Handled incremental ingestion using Auto Loader
- Cleaned, deduplicated, and merged data in the Silver layer with Delta Lake
- Modeled a star schema (fact + dimension tables) in the Gold layer
- Supported backfilling, schema changes, and CDC-style updates (no full reload)
- Built Power BI dashboards for analytics and visualization
Tech Stack
- Azure Databricks (PySpark)
- Delta Lake
- Azure Data Lake Storage
- Spotify API
- Power BI
- Git & GitHub