Malaysia Air Pollution Index Prediction Using Deep Learning

Forecasting Air Quality with Multi-Architecture Deep Learning

Project Overview

Built a deep learning forecasting system to predict air pollution levels in Malaysia using 6 years of monthly pollutant data (2017–2022). The model forecasts key pollutants (CO, NO₂, O₃, SO₂, PM10, PM2.5) to support early warning systems and public health planning. Focuses on time-series modeling, model comparison, and performance optimization.

What I Did

1️⃣ Time-Series Engineering

Converted raw data into supervised learning format using 6-step sliding window lookback
Handled missing values using median imputation by pollutant
Applied MinMaxScaler normalization to standardize pollutant scales

2️⃣ Advanced EDA & Feature Understanding

Identified seasonality trends (e.g., PM peaks in winter, O₃ peaks in summer)
Used Pearson correlation matrix to detect strong relationships (e.g., PM10 ↔ PM2.5)
Applied Random Forest feature importance to identify key predictors

3️⃣ Deep Learning Model Comparison

Developed and compared 4 architectures: LSTM, MLP, 1D CNN, Hybrid Model
Applied Bayesian Optimization (Keras Tuner) for hyperparameter tuning
Evaluated with MSE, MAE, RMSE, R² Score
LSTM achieved the most consistent performance across pollutants

Tools & Tech Stack

Python
TensorFlow / Keras
Keras Tuner (Bayesian Optimization)
Scikit-learn
Pandas / NumPy
Matplotlib / Seaborn
Random Forest (feature importance)
Time-series sliding window engineering

Impact-Oriented Framing

Designed and optimized a multi-architecture deep learning system for air quality forecasting, achieving improved temporal prediction accuracy using LSTM with Bayesian hyperparameter tuning.

Highlight Skills:

Time-series forecasting
Model benchmarking & comparison
Hyperparameter optimization
Feature engineering
API-based model deployment
End-to-end ML pipeline