
Crypto Market Data Engineering Pipeline
Scalable Data Platform for Cryptocurrency Analytics
Overview
Built an end-to-end data engineering pipeline to ingest cryptocurrency market data from a public API, transform it through structured layers, and generate analytical metrics using scheduled workflows. The system separates raw data storage from cleaned and derived datasets for reliability and reproducibility.
What It Does
- Pulls daily market data (e.g., prices, volumes) from the CoinGecko API
- Stores raw JSON in object storage (Bronze layer)
- Cleans and structures data into PostgreSQL (Silver layer)
- Generates daily metrics (returns, moving averages) in a Gold layer
- Managed and scheduled by orchestrator workflows
- Fully containerized for reproducible deployment
Tech Stack
- Apache Airflow – workflow orchestration
- PostgreSQL – structured storage & analytics
- MinIO – object storage for raw data
- Docker – containerization
- CoinGecko API – real-time crypto data source
- Python – scripting and pipeline logic
Key Highlights
- Implements layered data engineering architecture (Bronze/Silver/Gold)
- Demonstrates data orchestration with Airflow
- Works with object storage and relational databases
- Containerized setup for portability and reproducibility
- Solid example of real-world ETL pipeline design for analytics