Project 6 Preview

Malaysia Air Pollution Index Prediction Using Deep Learning

Forecasting Air Quality with Multi-Architecture Deep Learning


Project Overview

Built a deep learning forecasting system to predict air pollution levels in Malaysia using 6 years of monthly pollutant data (2017–2022). The model forecasts key pollutants (CO, NO₂, O₃, SO₂, PM10, PM2.5) to support early warning systems and public health planning. Focuses on time-series modeling, model comparison, and performance optimization.

What I Did

1️⃣ Time-Series Engineering

  • Converted raw data into supervised learning format using 6-step sliding window lookback
  • Handled missing values using median imputation by pollutant
  • Applied MinMaxScaler normalization to standardize pollutant scales

2️⃣ Advanced EDA & Feature Understanding

  • Identified seasonality trends (e.g., PM peaks in winter, O₃ peaks in summer)
  • Used Pearson correlation matrix to detect strong relationships (e.g., PM10 ↔ PM2.5)
  • Applied Random Forest feature importance to identify key predictors

3️⃣ Deep Learning Model Comparison

  • Developed and compared 4 architectures: LSTM, MLP, 1D CNN, Hybrid Model
  • Applied Bayesian Optimization (Keras Tuner) for hyperparameter tuning
  • Evaluated with MSE, MAE, RMSE, R² Score
  • LSTM achieved the most consistent performance across pollutants

Tools & Tech Stack

  • Python
  • TensorFlow / Keras
  • Keras Tuner (Bayesian Optimization)
  • Scikit-learn
  • Pandas / NumPy
  • Matplotlib / Seaborn
  • Random Forest (feature importance)
  • Time-series sliding window engineering

Impact-Oriented Framing

Designed and optimized a multi-architecture deep learning system for air quality forecasting, achieving improved temporal prediction accuracy using LSTM with Bayesian hyperparameter tuning.
Highlight Skills:
  • Time-series forecasting
  • Model benchmarking & comparison
  • Hyperparameter optimization
  • Feature engineering
  • API-based model deployment
  • End-to-end ML pipeline