Spotify Azure Project Preview

Spotify Azure Data Engineering Pipeline

Medallion Architecture + Power BI Analytics


Overview

Built an end-to-end Spotify data pipeline using PySpark on Azure Databricks. It follows a Bronze – Silver – Gold (Medallion) architecture: raw ingestion, cleaned Delta tables, and analytics-ready models. The final datasets power an interactive Power BI dashboard for insights on tracks, artists, and listening trends.

What I Built

  • Ingested Spotify data into the Bronze layer (raw storage)
  • Handled incremental ingestion using Auto Loader
  • Cleaned, deduplicated, and merged data in the Silver layer with Delta Lake
  • Modeled a star schema (fact + dimension tables) in the Gold layer
  • Supported backfilling, schema changes, and CDC-style updates (no full reload)
  • Built Power BI dashboards for analytics and visualization

Tech Stack

  • Azure Databricks (PySpark)
  • Delta Lake
  • Azure Data Lake Storage
  • Spotify API
  • Power BI
  • Git & GitHub