
Malaysia Air Pollution Index Prediction Using Deep Learning
Forecasting Air Quality with Multi-Architecture Deep Learning
Project Overview
Built a deep learning forecasting system to predict air pollution levels in Malaysia using 6 years of monthly pollutant data (2017–2022). The model forecasts key pollutants (CO, NO₂, O₃, SO₂, PM10, PM2.5) to support early warning systems and public health planning. Focuses on time-series modeling, model comparison, and performance optimization.
What I Did
1️⃣ Time-Series Engineering
- Converted raw data into supervised learning format using 6-step sliding window lookback
- Handled missing values using median imputation by pollutant
- Applied MinMaxScaler normalization to standardize pollutant scales
2️⃣ Advanced EDA & Feature Understanding
- Identified seasonality trends (e.g., PM peaks in winter, O₃ peaks in summer)
- Used Pearson correlation matrix to detect strong relationships (e.g., PM10 ↔ PM2.5)
- Applied Random Forest feature importance to identify key predictors
3️⃣ Deep Learning Model Comparison
- Developed and compared 4 architectures: LSTM, MLP, 1D CNN, Hybrid Model
- Applied Bayesian Optimization (Keras Tuner) for hyperparameter tuning
- Evaluated with MSE, MAE, RMSE, R² Score
- LSTM achieved the most consistent performance across pollutants
Tools & Tech Stack
- Python
- TensorFlow / Keras
- Keras Tuner (Bayesian Optimization)
- Scikit-learn
- Pandas / NumPy
- Matplotlib / Seaborn
- Random Forest (feature importance)
- Time-series sliding window engineering
Impact-Oriented Framing
Designed and optimized a multi-architecture deep learning system for air quality forecasting, achieving improved temporal prediction accuracy using LSTM with Bayesian hyperparameter tuning.
Highlight Skills:
- Time-series forecasting
- Model benchmarking & comparison
- Hyperparameter optimization
- Feature engineering
- API-based model deployment
- End-to-end ML pipeline