Project 5 Preview

Movie Review Sentiment Analysis

Using Machine Learning & Transformer Models


Project Overview

Built a sentiment classification system to automatically classify movie reviews as Positive or Negative. The project compares traditional machine learning with modern transformer models to evaluate how contextual language understanding improves sentiment detection.

  • Text classification
  • NLP preprocessing
  • Model benchmarking
  • Transformer fine-tuning

Models Implemented

  1. Baseline Model
    • TF-IDF (unigrams + bigrams)
    • Logistic Regression
  2. Pre-trained DistilBERT
    • Used without fine-tuning
  3. Fine-Tuned DistilBERT
    • Fine-tuned on IMDb dataset

Performance (Test Set)

  • Baseline (TF-IDF + LR) → 90.25% Accuracy
  • Pre-trained DistilBERT → 83.15% Accuracy
  • Fine-Tuned DistilBERT → 86.80% Accuracy

Key insight: Model adaptation (fine-tuning) matters more than model complexity.

Tech Stack

  • Python
  • Pandas
  • Scikit-learn
  • TF-IDF Vectorizer
  • Logistic Regression
  • HuggingFace Transformers
  • DistilBERT
  • PyTorch / Trainer API
  • Evaluation: Accuracy, F1-score