Spotify Azure Data Engineering Pipeline

Medallion Architecture + Power BI Analytics

Overview

Built an end-to-end Spotify data pipeline using PySpark on Azure Databricks. It follows a Bronze – Silver – Gold (Medallion) architecture: raw ingestion, cleaned Delta tables, and analytics-ready models. The final datasets power an interactive Power BI dashboard for insights on tracks, artists, and listening trends.

- Data Pipeline Repo

- Power BI Repo

What I Built

Ingested Spotify data into the Bronze layer (raw storage)
Handled incremental ingestion using Auto Loader
Cleaned, deduplicated, and merged data in the Silver layer with Delta Lake
Modeled a star schema (fact + dimension tables) in the Gold layer
Supported backfilling, schema changes, and CDC-style updates (no full reload)
Built Power BI dashboards for analytics and visualization

Tech Stack

Azure Databricks (PySpark)
Delta Lake
Azure Data Lake Storage
Spotify API
Power BI
Git & GitHub