
Movie Review Sentiment Analysis
Using Machine Learning & Transformer Models
Project Overview
Built a sentiment classification system to automatically classify movie reviews as Positive or Negative. The project compares traditional machine learning with modern transformer models to evaluate how contextual language understanding improves sentiment detection.
- Text classification
- NLP preprocessing
- Model benchmarking
- Transformer fine-tuning
Models Implemented
- Baseline Model
- TF-IDF (unigrams + bigrams)
- Logistic Regression
- Pre-trained DistilBERT
- Used without fine-tuning
- Fine-Tuned DistilBERT
- Fine-tuned on IMDb dataset
Performance (Test Set)
- Baseline (TF-IDF + LR) → 90.25% Accuracy
- Pre-trained DistilBERT → 83.15% Accuracy
- Fine-Tuned DistilBERT → 86.80% Accuracy
Key insight: Model adaptation (fine-tuning) matters more than model complexity.
Tech Stack
- Python
- Pandas
- Scikit-learn
- TF-IDF Vectorizer
- Logistic Regression
- HuggingFace Transformers
- DistilBERT
- PyTorch / Trainer API
- Evaluation: Accuracy, F1-score