Movie Review Sentiment Analysis

Using Machine Learning & Transformer Models

Project Overview

Built a sentiment classification system to automatically classify movie reviews as Positive or Negative. The project compares traditional machine learning with modern transformer models to evaluate how contextual language understanding improves sentiment detection.

Text classification
NLP preprocessing
Model benchmarking
Transformer fine-tuning

Models Implemented

Baseline Model
- TF-IDF (unigrams + bigrams)
- Logistic Regression
Pre-trained DistilBERT
- Used without fine-tuning
Fine-Tuned DistilBERT
- Fine-tuned on IMDb dataset

Performance (Test Set)

Baseline (TF-IDF + LR) → 90.25% Accuracy
Pre-trained DistilBERT → 83.15% Accuracy
Fine-Tuned DistilBERT → 86.80% Accuracy

Key insight: Model adaptation (fine-tuning) matters more than model complexity.

Tech Stack

Python
Pandas
Scikit-learn
TF-IDF Vectorizer
Logistic Regression
HuggingFace Transformers
DistilBERT
PyTorch / Trainer API
Evaluation: Accuracy, F1-score