{ "cells": [ { "cell_type": "markdown", "id": "38165dd7-765e-48b0-bb83-272a98d7cb5c", "metadata": {}, "source": [ "# PHASE 1: EXPLAIN & BREAKDOWN (LEARNING PHASE)\n", "\n", "## 1. Simple Explanation of Sequence Modeling\n", "\n", "Sequence modeling is about understanding and predicting patterns in ordered data where the position matters. Think of it like reading a sentence - each word depends on the previous words to make sense. In AI, we use sequence models to process data like text (predicting the next word), time series (stock prices over time), or speech (converting audio to text). The key insight is that sequences have temporal or positional dependencies - what comes before influences what comes next. These models learn to capture these relationships and can generate new sequences or make predictions about future elements. Common architectures include RNNs, LSTMs, and Transformers, each designed to handle different aspects of sequence dependencies.\n", "\n", "## 2. Detailed Roadmap\n", "\n", "**Step 1: Foundation Concepts**\n", "- Understanding sequences and temporal dependencies\n", "- Types of sequence problems (sequence-to-sequence, sequence-to-one, one-to-sequence)\n", "- Input/output representations and encoding\n", "\n", "**Step 2: Basic RNN Architecture**\n", "- Vanilla RNN structure and forward pass\n", "- Hidden state concept and recurrent connections\n", "- Backpropagation through time (BPTT)\n", "\n", "**Step 3: RNN Limitations & Solutions**\n", "- Vanishing gradient problem\n", "- Long-term dependency issues\n", "- Introduction to gating mechanisms\n", "\n", "**Step 4: LSTM Networks**\n", "- Cell state vs hidden state\n", "- Forget, input, and output gates\n", "- Information flow through LSTM cells\n", "\n", "**Step 5: GRU Networks**\n", "- Simplified gating mechanism\n", "- Reset and update gates\n", "- Comparison with LSTM\n", "\n", "**Step 6: Advanced Architectures**\n", "- Bidirectional RNNs\n", "- Encoder-decoder models\n", "- Attention mechanisms\n", "\n", "**Step 7: Modern Approaches**\n", "- Transformer architecture basics\n", "- Self-attention concept\n", "- Applications in different domains\n", "\n", "## 3. Key Formulas with Explanations\n", "\n", "**Vanilla RNN:**\n", "```\n", "h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h)\n", "y_t = W_hy * h_t + b_y\n", "```\n", "- `h_t`: Hidden state at time t (captures information from current and previous inputs)\n", "- `W_hh`: Weight matrix for hidden-to-hidden connections (learns temporal dependencies)\n", "- `W_xh`: Weight matrix for input-to-hidden connections (processes current input)\n", "- `W_hy`: Weight matrix for hidden-to-output connections (generates predictions)\n", "- `x_t`: Input at time t\n", "- `b_h, b_y`: Bias terms for hidden and output layers\n", "\n", "**LSTM Gates:**\n", "```\n", "f_t = σ(W_f * [h_{t-1}, x_t] + b_f) # Forget gate\n", "i_t = σ(W_i * [h_{t-1}, x_t] + b_i) # Input gate\n", "C̃_t = tanh(W_C * [h_{t-1}, x_t] + b_C) # Candidate values\n", "C_t = f_t * C_{t-1} + i_t * C̃_t # Cell state\n", "o_t = σ(W_o * [h_{t-1}, x_t] + b_o) # Output gate\n", "h_t = o_t * tanh(C_t) # Hidden state\n", "```\n", "- `σ`: Sigmoid function (outputs 0-1, acts as gate)\n", "- `f_t`: Forget gate (decides what to remove from cell state)\n", "- `i_t`: Input gate (decides what new information to store)\n", "- `C_t`: Cell state (long-term memory)\n", "- `o_t`: Output gate (decides what parts of cell state to output)\n", "\n", "## 4. Step-by-Step Numerical Example\n", "\n", "Let's trace through a simple RNN with sequence \"hello\":\n", "\n", "**Given:**\n", "- Vocabulary: {h:0, e:1, l:2, o:3}\n", "- Hidden size: 2\n", "- Input size: 4 (one-hot encoded)\n", "\n", "**Initialization:**\n", "```\n", "W_hh = [[0.5, 0.3], [0.2, 0.7]]\n", "W_xh = [[0.1, 0.4], [0.3, 0.2], [0.5, 0.1], [0.2, 0.6]]\n", "h_0 = [0.0, 0.0]\n", "```\n", "\n", "**Step 1: Process 'h' (x_1 = [1,0,0,0])**\n", "```\n", "W_xh * x_1 = [[0.1, 0.4], [0.3, 0.2], [0.5, 0.1], [0.2, 0.6]] * [1,0,0,0] = [0.1, 0.4]\n", "W_hh * h_0 = [[0.5, 0.3], [0.2, 0.7]] * [0.0, 0.0] = [0.0, 0.0]\n", "h_1 = tanh([0.1, 0.4]) = [0.099, 0.380]\n", "```\n", "\n", "**Step 2: Process 'e' (x_2 = [0,1,0,0])**\n", "```\n", "W_xh * x_2 = [0.3, 0.2]\n", "W_hh * h_1 = [0.5*0.099 + 0.3*0.380, 0.2*0.099 + 0.7*0.380] = [0.164, 0.286]\n", "h_2 = tanh([0.3+0.164, 0.2+0.286]) = tanh([0.464, 0.486]) = [0.434, 0.449]\n", "```\n", "\n", "This continues for each character, building up context in the hidden state.\n", "\n", "## 5. Real-World AI Use Case\n", "\n", "**Language Translation System:**\n", "Google Translate uses sequence-to-sequence models for translation. The encoder processes the source language sentence word by word, building a context representation. The decoder then generates the target language translation, considering both the source context and previously generated words. For example, translating \"How are you?\" to Spanish:\n", "\n", "- Encoder processes: \"How\" → \"are\" → \"you?\" → \"?\"\n", "- Builds context vector capturing meaning\n", "- Decoder generates: \"¿\" → \"Cómo\" → \"estás\" → \"?\"\n", "\n", "The model learns relationships between languages, handling word order differences, idiomatic expressions, and context-dependent translations.\n", "\n", "## 6. Tips for Mastering Sequence Modeling\n", "\n", "**Practice Sources:**\n", "- Implement character-level text generation\n", "- Build sentiment analysis on movie reviews\n", "- Create time series forecasting models\n", "- Work with speech recognition datasets\n", "\n", "**Resources:**\n", "- \"Deep Learning\" by Goodfellow, Bengio, and Courville (Chapter 10)\n", "- Stanford CS224n NLP course materials\n", "- PyTorch RNN tutorials and documentation\n", "- Kaggle competitions: sentiment analysis, time series prediction\n", "\n", "**Key Practice Problems:**\n", "- Name generation using character RNNs\n", "- Stock price prediction with LSTM\n", "- Machine translation with attention\n", "- Chatbot development using sequence-to-sequence models\n", "\n", "**Debugging Tips:**\n", "- Always check tensor shapes at each step\n", "- Visualize hidden state evolution\n", "- Start with small sequences and simple models\n", "- Use teacher forcing during training for faster convergence" ] }, { "cell_type": "code", "execution_count": 1, "id": "628b6232-95ff-4a68-a092-4eab67abc0cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.13.5\n" ] } ], "source": [ "!python --version" ] }, { "cell_type": "markdown", "id": "a7d1572d-1586-4e37-9a36-1779b401cd11", "metadata": {}, "source": [ "# PHASE 1: EXPLAIN & BREAKDOWN (LEARNING PHASE)\n", "\n", "## 1. Simple Explanation of Sequence Modeling\n", "\n", "Sequence modeling is about understanding and predicting patterns in ordered data where the position matters. Think of it like reading a sentence - each word depends on the previous words to make sense. In AI, we use sequence models to process data like text (predicting the next word), time series (stock prices over time), or speech (converting audio to text). The key insight is that sequences have temporal or positional dependencies - what comes before influences what comes next. These models learn to capture these relationships and can generate new sequences or make predictions about future elements. Common architectures include RNNs, LSTMs, and Transformers, each designed to handle different aspects of sequence dependencies.\n", "\n", "## 2. Detailed Roadmap with Examples\n", "\n", "**Step 1: Foundation Concepts**\n", "- **Understanding sequences and temporal dependencies**: \n", " - *Example*: In \"The cat sat on the ___\", the word \"mat\" is much more likely than \"elephant\" because of the context. The model learns that \"sat on\" often precedes furniture/surfaces.\n", " - *Time series example*: Stock prices - if Apple stock dropped 5% yesterday and the market is bearish, today's price is likely to be influenced by yesterday's drop.\n", "\n", "- **Types of sequence problems**:\n", " - *Sequence-to-sequence*: \"Hello world\" → \"Hola mundo\" (translation)\n", " - *Sequence-to-one*: \"This movie is amazing!\" → \"Positive\" (sentiment analysis)\n", " - *One-to-sequence*: [Image of dog] → \"A golden retriever running in a park\" (image captioning)\n", " - *Many-to-many*: [Audio waveform] → \"Hello how are you\" (speech recognition)\n", "\n", "- **Input/output representations and encoding**:\n", " - *Text example*: \"cat\" → token ID 156 → embedding vector [0.2, -0.1, 0.5, 0.8]\n", " - *Time series example*: Stock price $150.25 → normalized value 0.73 (after min-max scaling)\n", " - *One-hot example*: Word \"dog\" in vocab {cat:0, dog:1, bird:2} → [0, 1, 0]\n", "\n", "**Step 2: Basic RNN Architecture**\n", "- **Vanilla RNN structure and forward pass**:\n", " - *Example*: Processing \"I love pizza\"\n", " - Step 1: Process \"I\" → hidden state captures \"someone\"\n", " - Step 2: Process \"love\" + previous hidden → captures \"someone loves something\"\n", " - Step 3: Process \"pizza\" + previous hidden → captures \"someone loves pizza\"\n", "\n", "- **Hidden state concept and recurrent connections**:\n", " - *Example*: Reading \"The cat, which was black, meowed\"\n", " - After \"cat\": hidden state = [0.2, 0.8, 0.1] (represents \"cat\")\n", " - After \"which\": hidden state = [0.3, 0.7, 0.2] (represents \"cat which\")\n", " - After \"black\": hidden state = [0.4, 0.6, 0.3] (represents \"black cat\")\n", " - After \"meowed\": hidden state = [0.5, 0.5, 0.4] (represents \"black cat meowed\")\n", "\n", "- **Backpropagation through time (BPTT)**:\n", " - *Example*: Training on \"cat sat\"\n", " - Forward: \"cat\" → h₁ → predict \"sat\" → loss = 0.8\n", " - Backward: Error flows back through h₁ to update weights for \"cat\" processing\n", " - This happens for each time step in the sequence\n", "\n", "**Step 3: RNN Limitations & Solutions**\n", "- **Vanishing gradient problem**:\n", " - *Example*: In \"The cat that lived in Paris for many years finally came home\", by the time we reach \"came\", the gradient signal from \"cat\" has become too weak to learn the connection.\n", " - *Mathematical example*: If gradient = 0.1 and we multiply by 0.5 at each step, after 10 steps: 0.1 × 0.5¹⁰ = 0.0001 (vanished!)\n", "\n", "- **Long-term dependency issues**:\n", " - *Example*: \"I grew up in France... I speak fluent French\" - RNN forgets \"France\" by the time it reaches \"French\"\n", " - *Bad example*: \"I grew up in France... [500 words about other topics]... I speak fluent ___\" - RNN predicts \"English\" instead of \"French\"\n", "\n", "- **Introduction to gating mechanisms**:\n", " - *Example*: Like a water valve that can be fully open (1.0), fully closed (0.0), or partially open (0.7)\n", " - *Text example*: When processing \"However, the cat...\", the gate learns to reduce importance of previous positive sentiment because \"However\" signals a contrast\n", "\n", "**Step 4: LSTM Networks**\n", "- **Cell state vs hidden state**:\n", " - *Cell state example*: Long-term memory storing \"We're talking about cats\" throughout a paragraph\n", " - *Hidden state example*: Short-term memory storing \"currently processing the word 'fluffy'\" for immediate prediction\n", "\n", "- **Forget, input, and output gates**:\n", " - *Forget gate example*: In \"John is tall. Mary is short.\", when processing \"Mary\", forget gate removes \"John\" information\n", " - *Input gate example*: When seeing \"Mary\", input gate decides to store \"Mary is the new subject\"\n", " - *Output gate example*: When predicting next word after \"Mary is\", output gate decides which stored information to use\n", "\n", "- **Information flow through LSTM cells**:\n", " - *Complete example*: Processing \"The cat is black. The dog is white.\"\n", " - Step 1: \"cat\" → Cell stores \"animal=cat, color=unknown\"\n", " - Step 2: \"is\" → Cell keeps \"animal=cat\", prepares for attribute\n", " - Step 3: \"black\" → Cell updates to \"animal=cat, color=black\"\n", " - Step 4: \"dog\" → Forget gate removes cat info, Input gate adds \"animal=dog\"\n", " - Step 5: \"white\" → Cell becomes \"animal=dog, color=white\"\n", "\n", "**Step 5: GRU Networks**\n", "- **Simplified gating mechanism**:\n", " - *Example*: Instead of 3 gates (forget, input, output), GRU has 2 gates (reset, update)\n", " - *Reset gate example*: When processing \"But the dog...\", reset gate decides to forget previous \"cat\" information\n", " - *Update gate example*: Decides how much of new \"dog\" information to keep vs old information\n", "\n", "- **Comparison with LSTM**:\n", " - *Speed example*: GRU trains 25% faster on same hardware because fewer parameters\n", " - *Performance example*: On simple tasks like sentiment analysis, GRU performs similarly to LSTM\n", " - *Memory example*: LSTM better for very long sequences (1000+ words), GRU better for shorter sequences\n", "\n", "**Step 6: Advanced Architectures**\n", "- **Bidirectional RNNs**:\n", " - *Example*: \"The animal that I saw yesterday was a ___\"\n", " - Forward RNN: \"The animal that I saw yesterday was a\" → predicts based on left context\n", " - Backward RNN: \"cat\" ← \"a was yesterday saw I that animal The\" → predicts based on right context\n", " - Combined: Both directions agree on \"cat\" with high confidence\n", "\n", "- **Encoder-decoder models**:\n", " - *Translation example*: \"How are you?\"\n", " - Encoder: \"How\" → \"are\" → \"you?\" → context vector [0.2, 0.8, 0.1, 0.9]\n", " - Decoder: context vector → \"¿\" → \"Cómo\" → \"estás\" → \"?\"\n", "\n", "- **Attention mechanisms**:\n", " - *Example*: Translating \"The black cat sat on the red mat\"\n", " - When generating \"negro\" (black), attention focuses on \"black\" in source\n", " - When generating \"gato\" (cat), attention focuses on \"cat\" in source \n", " - When generating \"roja\" (red), attention focuses on \"red\" in source\n", "\n", "**Step 7: Modern Approaches**\n", "- **Transformer architecture basics**:\n", " - *Example*: Instead of processing \"I love pizza\" sequentially, Transformer processes all words simultaneously\n", " - *Parallel processing*: All positions computed at once instead of waiting for previous steps\n", " - *Self-attention example*: In \"The cat sat on it\", \"it\" attends strongly to \"cat\" to understand what \"it\" refers to\n", "\n", "- **Self-attention concept**:\n", " - *Example*: \"The animal didn't cross the street because it was too tired\"\n", " - \"it\" pays attention to \"animal\" (not \"street\") to understand the reference\n", " - Attention weights: \"it\" → \"animal\" = 0.9, \"it\" → \"street\" = 0.1\n", "\n", "- **Applications in different domains**:\n", " - *NLP*: GPT-3 generating human-like text: \"Once upon a time\" → entire story\n", " - *Vision*: Vision Transformer treating image patches like words in a sentence\n", " - *Code*: GitHub Copilot completing code: \"def fibonacci(\" → complete function implementation\n", " - *Biology*: AlphaFold predicting protein structure from amino acid sequences\n", "\n", "## 3. Key Formulas with Explanations\n", "\n", "**Vanilla RNN:**\n", "```\n", "h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h)\n", "y_t = W_hy * h_t + b_y\n", "```\n", "- `h_t`: Hidden state at time t (captures information from current and previous inputs)\n", "- `W_hh`: Weight matrix for hidden-to-hidden connections (learns temporal dependencies)\n", "- `W_xh`: Weight matrix for input-to-hidden connections (processes current input)\n", "- `W_hy`: Weight matrix for hidden-to-output connections (generates predictions)\n", "- `x_t`: Input at time t\n", "- `b_h, b_y`: Bias terms for hidden and output layers\n", "\n", "**LSTM Gates:**\n", "```\n", "f_t = σ(W_f * [h_{t-1}, x_t] + b_f) # Forget gate\n", "i_t = σ(W_i * [h_{t-1}, x_t] + b_i) # Input gate\n", "C̃_t = tanh(W_C * [h_{t-1}, x_t] + b_C) # Candidate values\n", "C_t = f_t * C_{t-1} + i_t * C̃_t # Cell state\n", "o_t = σ(W_o * [h_{t-1}, x_t] + b_o) # Output gate\n", "h_t = o_t * tanh(C_t) # Hidden state\n", "```\n", "- `σ`: Sigmoid function (outputs 0-1, acts as gate)\n", "- `f_t`: Forget gate (decides what to remove from cell state)\n", "- `i_t`: Input gate (decides what new information to store)\n", "- `C_t`: Cell state (long-term memory)\n", "- `o_t`: Output gate (decides what parts of cell state to output)\n", "\n", "## 4. Step-by-Step Numerical Example\n", "\n", "Let's trace through a simple RNN with sequence \"hello\":\n", "\n", "**Given:**\n", "- Vocabulary: {h:0, e:1, l:2, o:3}\n", "- Hidden size: 2\n", "- Input size: 4 (one-hot encoded)\n", "\n", "**Initialization:**\n", "```\n", "W_hh = [[0.5, 0.3], [0.2, 0.7]]\n", "W_xh = [[0.1, 0.4], [0.3, 0.2], [0.5, 0.1], [0.2, 0.6]]\n", "h_0 = [0.0, 0.0]\n", "```\n", "\n", "**Step 1: Process 'h' (x_1 = [1,0,0,0])**\n", "```\n", "W_xh * x_1 = [[0.1, 0.4], [0.3, 0.2], [0.5, 0.1], [0.2, 0.6]] * [1,0,0,0] = [0.1, 0.4]\n", "W_hh * h_0 = [[0.5, 0.3], [0.2, 0.7]] * [0.0, 0.0] = [0.0, 0.0]\n", "h_1 = tanh([0.1, 0.4]) = [0.099, 0.380]\n", "```\n", "\n", "**Step 2: Process 'e' (x_2 = [0,1,0,0])**\n", "```\n", "W_xh * x_2 = [0.3, 0.2]\n", "W_hh * h_1 = [0.5*0.099 + 0.3*0.380, 0.2*0.099 + 0.7*0.380] = [0.164, 0.286]\n", "h_2 = tanh([0.3+0.164, 0.2+0.286]) = tanh([0.464, 0.486]) = [0.434, 0.449]\n", "```\n", "\n", "This continues for each character, building up context in the hidden state.\n", "\n", "## 5. Real-World AI Use Case\n", "\n", "**Language Translation System:**\n", "Google Translate uses sequence-to-sequence models for translation. The encoder processes the source language sentence word by word, building a context representation. The decoder then generates the target language translation, considering both the source context and previously generated words. For example, translating \"How are you?\" to Spanish:\n", "\n", "- Encoder processes: \"How\" → \"are\" → \"you?\" → \"?\"\n", "- Builds context vector capturing meaning\n", "- Decoder generates: \"¿\" → \"Cómo\" → \"estás\" → \"?\"\n", "\n", "The model learns relationships between languages, handling word order differences, idiomatic expressions, and context-dependent translations.\n", "\n", "## 6. Tips for Mastering Sequence Modeling\n", "\n", "**Practice Sources:**\n", "- Implement character-level text generation\n", "- Build sentiment analysis on movie reviews\n", "- Create time series forecasting models\n", "- Work with speech recognition datasets\n", "\n", "**Resources:**\n", "- \"Deep Learning\" by Goodfellow, Bengio, and Courville (Chapter 10)\n", "- Stanford CS224n NLP course materials\n", "- PyTorch RNN tutorials and documentation\n", "- Kaggle competitions: sentiment analysis, time series prediction\n", "\n", "**Key Practice Problems:**\n", "- Name generation using character RNNs\n", "- Stock price prediction with LSTM\n", "- Machine translation with attention\n", "- Chatbot development using sequence-to-sequence models\n", "\n", "**Debugging Tips:**\n", "- Always check tensor shapes at each step\n", "- Visualize hidden state evolution\n", "- Start with small sequences and simple models\n", "- Use teacher forcing during training for faster convergence" ] }, { "cell_type": "code", "execution_count": 4, "id": "4cada01e-01b7-442a-adca-ecfbac2f1ad7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2025-07-15 15:57:08,924 - Using device: mps\n", "2025-07-15 15:57:08,927 - Starting improved sequence modeling implementation\n", "2025-07-15 15:57:08,928 - Creating realistic movie review dataset with clear sentiment patterns\n", "2025-07-15 15:57:08,937 - Created 2000 realistic movie review samples\n", "2025-07-15 15:57:08,937 - Label distribution: Counter({1: 1000, 0: 1000})\n", "2025-07-15 15:57:08,937 - Sample positive review: I loved this comedy! The dialogue was amazing and the dialogue was fantastic.\n", "2025-07-15 15:57:08,938 - Sample negative review: The direction in this movie was dull. disliked the story and soundtrack.\n", "2025-07-15 15:57:08,938 - Building vocabulary from texts\n", "2025-07-15 15:57:08,940 - Vocabulary size: 150\n", "2025-07-15 15:57:08,940 - Most common words: [('was', 1956), ('the', 1902), ('this', 1190), ('and', 1190), ('i', 710), ('with', 548), ('movie', 525), ('story', 513), ('is', 488), ('it.', 407)]\n", "2025-07-15 15:57:08,943 - Dataset splits - Train: 1200, Val: 400, Test: 400\n", "2025-07-15 15:57:08,943 - Train label distribution: Counter({0: 600, 1: 600})\n", "2025-07-15 15:57:08,943 - Val label distribution: Counter({1: 200, 0: 200})\n", "2025-07-15 15:57:08,943 - Test label distribution: Counter({0: 200, 1: 200})\n", "2025-07-15 15:57:08,944 - Created data loaders\n", "2025-07-15 15:57:08,951 - Model parameters - Total: 218114, Trainable: 218114\n", "2025-07-15 15:57:08,952 - Testing model with sample input shape: torch.Size([32, 50])\n", "2025-07-15 15:57:08,956 - Sample output shape: torch.Size([32, 2])\n", "2025-07-15 15:57:08,963 - Sample output values: tensor([ 0.0594, -0.0430], device='mps:0')\n", "2025-07-15 15:57:08,964 - Starting model training with improved parameters\n", "2025-07-15 15:57:08,965 - First batch shapes - Data: torch.Size([32, 50]), Target: torch.Size([32])\n", "2025-07-15 15:57:08,970 - Target labels in first batch: tensor([1, 0, 0, 1, 0, 1, 0, 0, 0, 1], device='mps:0')\n", "2025-07-15 15:57:08,973 - Model output shape: torch.Size([32, 2])\n", "2025-07-15 15:57:08,980 - Output logits sample: tensor([ 0.0127, -0.0346], device='mps:0', grad_fn=)\n", "2025-07-15 15:57:09,742 - Epoch [1/20]\n", "2025-07-15 15:57:09,742 - Train Loss: 0.6959, Train Acc: 47.67%\n", "2025-07-15 15:57:09,742 - Val Loss: 0.6933, Val Acc: 50.00%\n", "2025-07-15 15:57:09,743 - LR: 0.001000\n", "2025-07-15 15:57:09,743 - New best validation accuracy: 50.00%\n", "2025-07-15 15:57:10,198 - Epoch [2/20]\n", "2025-07-15 15:57:10,198 - Train Loss: 0.6929, Train Acc: 50.92%\n", "2025-07-15 15:57:10,198 - Val Loss: 0.6942, Val Acc: 50.00%\n", "2025-07-15 15:57:10,198 - LR: 0.001000\n", "2025-07-15 15:57:10,198 - No improvement for 1 epochs\n", "2025-07-15 15:57:10,673 - Epoch [3/20]\n", "2025-07-15 15:57:10,673 - Train Loss: 0.6963, Train Acc: 51.42%\n", "2025-07-15 15:57:10,674 - Val Loss: 0.6949, Val Acc: 50.00%\n", "2025-07-15 15:57:10,674 - LR: 0.001000\n", "2025-07-15 15:57:10,674 - No improvement for 2 epochs\n", "2025-07-15 15:57:11,120 - Epoch [4/20]\n", "2025-07-15 15:57:11,120 - Train Loss: 0.6938, Train Acc: 50.33%\n", "2025-07-15 15:57:11,120 - Val Loss: 0.6934, Val Acc: 50.50%\n", "2025-07-15 15:57:11,120 - LR: 0.001000\n", "2025-07-15 15:57:11,121 - New best validation accuracy: 50.50%\n", "2025-07-15 15:57:11,570 - Epoch [5/20]\n", "2025-07-15 15:57:11,570 - Train Loss: 0.6940, Train Acc: 48.08%\n", "2025-07-15 15:57:11,571 - Val Loss: 0.6933, Val Acc: 49.50%\n", "2025-07-15 15:57:11,571 - LR: 0.000500\n", "2025-07-15 15:57:11,571 - No improvement for 1 epochs\n", "2025-07-15 15:57:12,031 - Epoch [6/20]\n", "2025-07-15 15:57:12,031 - Train Loss: 0.6941, Train Acc: 47.58%\n", "2025-07-15 15:57:12,032 - Val Loss: 0.6933, Val Acc: 50.50%\n", "2025-07-15 15:57:12,032 - LR: 0.000500\n", "2025-07-15 15:57:12,032 - No improvement for 2 epochs\n", "2025-07-15 15:57:12,467 - Epoch [7/20]\n", "2025-07-15 15:57:12,468 - Train Loss: 0.6935, Train Acc: 48.58%\n", "2025-07-15 15:57:12,468 - Val Loss: 0.6934, Val Acc: 48.25%\n", "2025-07-15 15:57:12,468 - LR: 0.000500\n", "2025-07-15 15:57:12,468 - No improvement for 3 epochs\n", "2025-07-15 15:57:12,905 - Epoch [8/20]\n", "2025-07-15 15:57:12,906 - Train Loss: 0.6935, Train Acc: 48.83%\n", "2025-07-15 15:57:12,906 - Val Loss: 0.6932, Val Acc: 50.00%\n", "2025-07-15 15:57:12,906 - LR: 0.000500\n", "2025-07-15 15:57:12,906 - No improvement for 4 epochs\n", "2025-07-15 15:57:13,338 - Epoch [9/20]\n", "2025-07-15 15:57:13,338 - Train Loss: 0.6925, Train Acc: 52.58%\n", "2025-07-15 15:57:13,338 - Val Loss: 0.6935, Val Acc: 49.00%\n", "2025-07-15 15:57:13,339 - LR: 0.000500\n", "2025-07-15 15:57:13,339 - No improvement for 5 epochs\n", "2025-07-15 15:57:13,339 - Early stopping triggered after 9 epochs\n", "2025-07-15 15:57:13,339 - Testing model on test set\n", "2025-07-15 15:57:13,446 - Test Accuracy: 0.4950\n", "2025-07-15 15:57:13,446 - Classification Report:\n", "2025-07-15 15:57:13,449 - \n", " precision recall f1-score support\n", "\n", " Negative 0.42 0.03 0.05 200\n", " Positive 0.50 0.96 0.66 200\n", "\n", " accuracy 0.49 400\n", " macro avg 0.46 0.49 0.35 400\n", "weighted avg 0.46 0.49 0.35 400\n", "\n", "2025-07-15 15:57:13,449 - Demonstrating model predictions on sample texts\n", "2025-07-15 15:57:13,751 - Text: 'This movie is absolutely fantastic and amazing! I loved every minute of it.'\n", "2025-07-15 15:57:13,752 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,753 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,753 - ---\n", "2025-07-15 15:57:13,759 - Text: 'Terrible boring film, complete waste of time. I hated everything about it.'\n", "2025-07-15 15:57:13,759 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,760 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,760 - ---\n", "2025-07-15 15:57:13,767 - Text: 'Excellent story with wonderful acting and brilliant performance throughout.'\n", "2025-07-15 15:57:13,767 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,768 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,768 - ---\n", "2025-07-15 15:57:13,772 - Text: 'Awful movie with horrible dialogue. Disappointed and would not recommend.'\n", "2025-07-15 15:57:13,772 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,773 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,774 - ---\n", "2025-07-15 15:57:13,778 - Text: 'The cinematography was superb and the plot was incredible. Highly recommend!'\n", "2025-07-15 15:57:13,778 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,779 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,779 - ---\n", "2025-07-15 15:57:13,783 - Text: 'Poor script with terrible acting. One of the worst films I have ever seen.'\n", "2025-07-15 15:57:13,783 - Prediction: Negative (Confidence: 0.506)\n", "2025-07-15 15:57:13,784 - Probabilities - Negative: 0.506, Positive: 0.494\n", "2025-07-15 15:57:13,784 - ---\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2025-07-15 15:57:13,908 - Implementation completed successfully!\n" ] } ], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.utils.data import DataLoader, Dataset\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import LabelEncoder\n", "from sklearn.metrics import accuracy_score, classification_report\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import logging\n", "from collections import Counter\n", "import random\n", "\n", "logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')\n", "logger = logging.getLogger(__name__)\n", "\n", "device = torch.device('mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu')\n", "logger.info(f\"Using device: {device}\")\n", "\n", "class TextDataset(Dataset):\n", " def __init__(self, texts, labels, vocab_to_idx, max_length=50):\n", " self.texts = texts\n", " self.labels = labels\n", " self.vocab_to_idx = vocab_to_idx\n", " self.max_length = max_length\n", " \n", " def __len__(self):\n", " return len(self.texts)\n", " \n", " def __getitem__(self, idx):\n", " text = self.texts[idx]\n", " label = self.labels[idx]\n", " \n", " tokens = text.lower().split()\n", " token_ids = [self.vocab_to_idx.get(token, self.vocab_to_idx['']) for token in tokens]\n", " \n", " if len(token_ids) < self.max_length:\n", " token_ids.extend([self.vocab_to_idx['']] * (self.max_length - len(token_ids)))\n", " else:\n", " token_ids = token_ids[:self.max_length]\n", " \n", " return torch.tensor(token_ids, dtype=torch.long), torch.tensor(label, dtype=torch.long)\n", "\n", "class LSTMClassifier(nn.Module):\n", " def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers, num_classes, dropout=0.3):\n", " super(LSTMClassifier, self).__init__()\n", " self.hidden_dim = hidden_dim\n", " self.num_layers = num_layers\n", " \n", " self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)\n", " self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, \n", " batch_first=True, dropout=dropout, bidirectional=True)\n", " self.dropout = nn.Dropout(dropout)\n", " self.fc = nn.Linear(hidden_dim * 2, num_classes)\n", " \n", " def forward(self, x):\n", " batch_size = x.size(0)\n", " \n", " embedded = self.embedding(x)\n", " \n", " h0 = torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(x.device)\n", " c0 = torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(x.device)\n", " \n", " lstm_out, (hidden, _) = self.lstm(embedded, (h0, c0))\n", " \n", " output = self.dropout(lstm_out[:, -1, :])\n", " output = self.fc(output)\n", " \n", " return output\n", "\n", "def create_realistic_movie_dataset(num_samples=2000):\n", " logger.info(\"Creating realistic movie review dataset with clear sentiment patterns\")\n", " \n", " positive_templates = [\n", " \"This movie is {adj1} and {adj2}. The {element} was {quality}.\",\n", " \"I {loved} this {film_type}! The {element} was {excellent} and the {element2} was {amazing}.\",\n", " \"What a {fantastic} {film_type}! {loved} every minute of it. The {element} was {brilliant}.\",\n", " \"{excellent} {film_type} with {outstanding} {element}. Highly {recommend} it!\",\n", " \"The {element} in this movie was {superb}. {loved} the {element2} and {element3}.\",\n", " \"This is one of the {best} movies I have ever seen. {brilliant} {element} and {amazing} {element2}.\",\n", " \"I was {amazed} by this {film_type}. The {element} was {perfect} and {element2} was {incredible}.\",\n", " \"{wonderful} story with {excellent} {element}. {loved} everything about it.\",\n", " ]\n", " \n", " negative_templates = [\n", " \"This movie is {adj1} and {adj2}. The {element} was {quality}.\",\n", " \"I {hated} this {film_type}! The {element} was {terrible} and the {element2} was {awful}.\",\n", " \"What a {horrible} {film_type}! {wasted} my time. The {element} was {pathetic}.\",\n", " \"{terrible} {film_type} with {awful} {element}. Do not {recommend} it!\",\n", " \"The {element} in this movie was {boring}. {hated} the {element2} and {element3}.\",\n", " \"This is one of the {worst} movies I have ever seen. {terrible} {element} and {horrible} {element2}.\",\n", " \"I was {disappointed} by this {film_type}. The {element} was {bad} and {element2} was {ridiculous}.\",\n", " \"{disappointing} story with {poor} {element}. {hated} everything about it.\",\n", " ]\n", " \n", " positive_words = {\n", " 'adj1': ['amazing', 'fantastic', 'brilliant', 'excellent', 'wonderful'],\n", " 'adj2': ['outstanding', 'superb', 'incredible', 'magnificent', 'marvelous'],\n", " 'quality': ['excellent', 'brilliant', 'amazing', 'fantastic', 'superb'],\n", " 'loved': ['loved', 'adored', 'enjoyed'],\n", " 'excellent': ['excellent', 'brilliant', 'amazing'],\n", " 'amazing': ['amazing', 'fantastic', 'incredible'],\n", " 'brilliant': ['brilliant', 'superb', 'outstanding'],\n", " 'fantastic': ['fantastic', 'wonderful', 'marvelous'],\n", " 'outstanding': ['outstanding', 'exceptional', 'remarkable'],\n", " 'superb': ['superb', 'magnificent', 'splendid'],\n", " 'recommend': ['recommend', 'suggest'],\n", " 'best': ['best', 'greatest', 'finest'],\n", " 'perfect': ['perfect', 'flawless', 'ideal'],\n", " 'incredible': ['incredible', 'unbelievable', 'amazing'],\n", " 'wonderful': ['wonderful', 'delightful', 'lovely'],\n", " 'amazed': ['amazed', 'impressed', 'stunned']\n", " }\n", " \n", " negative_words = {\n", " 'adj1': ['terrible', 'awful', 'horrible', 'bad', 'disappointing'],\n", " 'adj2': ['boring', 'stupid', 'ridiculous', 'pathetic', 'useless'],\n", " 'quality': ['terrible', 'awful', 'horrible', 'bad', 'disappointing'],\n", " 'hated': ['hated', 'despised', 'disliked'],\n", " 'terrible': ['terrible', 'awful', 'horrible'],\n", " 'awful': ['awful', 'dreadful', 'atrocious'],\n", " 'horrible': ['horrible', 'disgusting', 'repulsive'],\n", " 'pathetic': ['pathetic', 'pitiful', 'miserable'],\n", " 'boring': ['boring', 'dull', 'tedious'],\n", " 'recommend': ['recommend', 'suggest'],\n", " 'worst': ['worst', 'poorest', 'most terrible'],\n", " 'bad': ['bad', 'poor', 'weak'],\n", " 'ridiculous': ['ridiculous', 'absurd', 'nonsensical'],\n", " 'disappointed': ['disappointed', 'let down', 'frustrated'],\n", " 'disappointing': ['disappointing', 'unsatisfying', 'mediocre'],\n", " 'poor': ['poor', 'weak', 'inadequate'],\n", " 'wasted': ['wasted', 'lost']\n", " }\n", " \n", " elements = ['acting', 'plot', 'story', 'dialogue', 'cinematography', 'direction', 'script', 'characters', 'soundtrack', 'ending']\n", " film_types = ['movie', 'film', 'picture', 'drama', 'thriller', 'comedy']\n", " \n", " texts = []\n", " labels = []\n", " \n", " for i in range(num_samples):\n", " if i < num_samples // 2:\n", " template = random.choice(positive_templates)\n", " word_dict = positive_words\n", " label = 1\n", " else:\n", " template = random.choice(negative_templates)\n", " word_dict = negative_words\n", " label = 0\n", " \n", " text = template\n", " \n", " for key in word_dict:\n", " if '{' + key + '}' in text:\n", " text = text.replace('{' + key + '}', random.choice(word_dict[key]))\n", " \n", " text = text.replace('{element}', random.choice(elements))\n", " text = text.replace('{element2}', random.choice(elements))\n", " text = text.replace('{element3}', random.choice(elements))\n", " text = text.replace('{film_type}', random.choice(film_types))\n", " \n", " remaining_placeholders = [word for word in text.split() if word.startswith('{') and word.endswith('}')]\n", " for placeholder in remaining_placeholders:\n", " clean_placeholder = placeholder.strip('{}')\n", " if clean_placeholder in positive_words:\n", " text = text.replace(placeholder, random.choice(positive_words[clean_placeholder]))\n", " elif clean_placeholder in negative_words:\n", " text = text.replace(placeholder, random.choice(negative_words[clean_placeholder]))\n", " \n", " texts.append(text)\n", " labels.append(label)\n", " \n", " logger.info(f\"Created {len(texts)} realistic movie review samples\")\n", " logger.info(f\"Label distribution: {Counter(labels)}\")\n", " logger.info(f\"Sample positive review: {[text for text, label in zip(texts, labels) if label == 1][0]}\")\n", " logger.info(f\"Sample negative review: {[text for text, label in zip(texts, labels) if label == 0][0]}\")\n", " \n", " return texts, labels\n", "\n", "def build_vocabulary(texts, min_freq=2):\n", " logger.info(\"Building vocabulary from texts\")\n", " \n", " word_counts = Counter()\n", " for text in texts:\n", " words = text.lower().split()\n", " word_counts.update(words)\n", " \n", " vocab_to_idx = {'': 0, '': 1}\n", " idx = 2\n", " \n", " for word, count in word_counts.items():\n", " if count >= min_freq:\n", " vocab_to_idx[word] = idx\n", " idx += 1\n", " \n", " logger.info(f\"Vocabulary size: {len(vocab_to_idx)}\")\n", " logger.info(f\"Most common words: {word_counts.most_common(10)}\")\n", " \n", " return vocab_to_idx\n", "\n", "def train_model(model, train_loader, val_loader, num_epochs=20):\n", " logger.info(\"Starting model training with improved parameters\")\n", " \n", " criterion = nn.CrossEntropyLoss()\n", " optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)\n", " scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3, factor=0.5)\n", " \n", " train_losses = []\n", " val_losses = []\n", " val_accuracies = []\n", " best_val_acc = 0\n", " patience = 5\n", " patience_counter = 0\n", " \n", " for epoch in range(num_epochs):\n", " model.train()\n", " total_train_loss = 0\n", " train_correct = 0\n", " train_total = 0\n", " \n", " for batch_idx, (data, target) in enumerate(train_loader):\n", " data, target = data.to(device), target.to(device)\n", " \n", " if batch_idx == 0 and epoch == 0:\n", " logger.info(f\"First batch shapes - Data: {data.shape}, Target: {target.shape}\")\n", " logger.info(f\"Target labels in first batch: {target[:10]}\")\n", " \n", " optimizer.zero_grad()\n", " output = model(data)\n", " \n", " if batch_idx == 0 and epoch == 0:\n", " logger.info(f\"Model output shape: {output.shape}\")\n", " logger.info(f\"Output logits sample: {output[0]}\")\n", " \n", " loss = criterion(output, target)\n", " loss.backward()\n", " \n", " torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)\n", " optimizer.step()\n", " \n", " total_train_loss += loss.item()\n", " _, predicted = torch.max(output.data, 1)\n", " train_total += target.size(0)\n", " train_correct += (predicted == target).sum().item()\n", " \n", " avg_train_loss = total_train_loss / len(train_loader)\n", " train_accuracy = 100 * train_correct / train_total\n", " train_losses.append(avg_train_loss)\n", " \n", " model.eval()\n", " total_val_loss = 0\n", " correct = 0\n", " total = 0\n", " \n", " with torch.no_grad():\n", " for data, target in val_loader:\n", " data, target = data.to(device), target.to(device)\n", " output = model(data)\n", " loss = criterion(output, target)\n", " total_val_loss += loss.item()\n", " \n", " _, predicted = torch.max(output.data, 1)\n", " total += target.size(0)\n", " correct += (predicted == target).sum().item()\n", " \n", " avg_val_loss = total_val_loss / len(val_loader)\n", " val_accuracy = 100 * correct / total\n", " \n", " val_losses.append(avg_val_loss)\n", " val_accuracies.append(val_accuracy)\n", " \n", " scheduler.step(avg_val_loss)\n", " current_lr = optimizer.param_groups[0]['lr']\n", " \n", " logger.info(f'Epoch [{epoch+1}/{num_epochs}]')\n", " logger.info(f'Train Loss: {avg_train_loss:.4f}, Train Acc: {train_accuracy:.2f}%')\n", " logger.info(f'Val Loss: {avg_val_loss:.4f}, Val Acc: {val_accuracy:.2f}%')\n", " logger.info(f'LR: {current_lr:.6f}')\n", " \n", " if val_accuracy > best_val_acc:\n", " best_val_acc = val_accuracy\n", " patience_counter = 0\n", " logger.info(f'New best validation accuracy: {best_val_acc:.2f}%')\n", " else:\n", " patience_counter += 1\n", " logger.info(f'No improvement for {patience_counter} epochs')\n", " \n", " if patience_counter >= patience:\n", " logger.info(f'Early stopping triggered after {epoch+1} epochs')\n", " break\n", " \n", " return train_losses, val_losses, val_accuracies\n", "\n", "def test_model(model, test_loader):\n", " logger.info(\"Testing model on test set\")\n", " \n", " model.eval()\n", " all_predictions = []\n", " all_targets = []\n", " all_probabilities = []\n", " \n", " with torch.no_grad():\n", " for data, target in test_loader:\n", " data, target = data.to(device), target.to(device)\n", " output = model(data)\n", " probabilities = torch.softmax(output, dim=1)\n", " _, predicted = torch.max(output, 1)\n", " \n", " all_predictions.extend(predicted.cpu().numpy())\n", " all_targets.extend(target.cpu().numpy())\n", " all_probabilities.extend(probabilities.cpu().numpy())\n", " \n", " accuracy = accuracy_score(all_targets, all_predictions)\n", " logger.info(f'Test Accuracy: {accuracy:.4f}')\n", " \n", " logger.info(\"Classification Report:\")\n", " try:\n", " report = classification_report(all_targets, all_predictions, \n", " target_names=['Negative', 'Positive'], \n", " zero_division=0)\n", " logger.info(f\"\\n{report}\")\n", " except Exception as e:\n", " logger.info(f\"Error in classification report: {e}\")\n", " \n", " return accuracy, all_predictions, all_targets, all_probabilities\n", "\n", "def demonstrate_predictions(model, vocab_to_idx, sample_texts):\n", " logger.info(\"Demonstrating model predictions on sample texts\")\n", " \n", " model.eval()\n", " max_length = 50\n", " \n", " with torch.no_grad():\n", " for text in sample_texts:\n", " tokens = text.lower().split()\n", " token_ids = [vocab_to_idx.get(token, vocab_to_idx['']) for token in tokens]\n", " \n", " if len(token_ids) < max_length:\n", " token_ids.extend([vocab_to_idx['']] * (max_length - len(token_ids)))\n", " else:\n", " token_ids = token_ids[:max_length]\n", " \n", " input_tensor = torch.tensor([token_ids], dtype=torch.long).to(device)\n", " output = model(input_tensor)\n", " probabilities = torch.softmax(output, dim=1)\n", " predicted_class = torch.argmax(output, dim=1).item()\n", " confidence = probabilities[0][predicted_class].item()\n", " \n", " sentiment = \"Positive\" if predicted_class == 1 else \"Negative\"\n", " logger.info(f\"Text: '{text}'\")\n", " logger.info(f\"Prediction: {sentiment} (Confidence: {confidence:.3f})\")\n", " logger.info(f\"Probabilities - Negative: {probabilities[0][0]:.3f}, Positive: {probabilities[0][1]:.3f}\")\n", " logger.info(\"---\")\n", "\n", "def main():\n", " logger.info(\"Starting improved sequence modeling implementation\")\n", " \n", " torch.manual_seed(42)\n", " np.random.seed(42)\n", " random.seed(42)\n", " \n", " texts, labels = create_realistic_movie_dataset(num_samples=2000)\n", " \n", " vocab_to_idx = build_vocabulary(texts, min_freq=2)\n", " \n", " X_temp, X_test, y_temp, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42, stratify=labels)\n", " X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp)\n", " \n", " logger.info(f\"Dataset splits - Train: {len(X_train)}, Val: {len(X_val)}, Test: {len(X_test)}\")\n", " logger.info(f\"Train label distribution: {Counter(y_train)}\")\n", " logger.info(f\"Val label distribution: {Counter(y_val)}\")\n", " logger.info(f\"Test label distribution: {Counter(y_test)}\")\n", " \n", " train_dataset = TextDataset(X_train, y_train, vocab_to_idx)\n", " val_dataset = TextDataset(X_val, y_val, vocab_to_idx)\n", " test_dataset = TextDataset(X_test, y_test, vocab_to_idx)\n", " \n", " train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n", " val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)\n", " test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)\n", " \n", " logger.info(\"Created data loaders\")\n", " \n", " vocab_size = len(vocab_to_idx)\n", " embedding_dim = 128\n", " hidden_dim = 64\n", " num_layers = 2\n", " num_classes = 2\n", " \n", " model = LSTMClassifier(vocab_size, embedding_dim, hidden_dim, num_layers, num_classes, dropout=0.2)\n", " model.to(device)\n", " \n", " total_params = sum(p.numel() for p in model.parameters())\n", " trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n", " logger.info(f\"Model parameters - Total: {total_params}, Trainable: {trainable_params}\")\n", " \n", " sample_batch = next(iter(train_loader))\n", " sample_input, sample_target = sample_batch\n", " sample_input = sample_input.to(device)\n", " \n", " logger.info(f\"Testing model with sample input shape: {sample_input.shape}\")\n", " with torch.no_grad():\n", " sample_output = model(sample_input)\n", " logger.info(f\"Sample output shape: {sample_output.shape}\")\n", " logger.info(f\"Sample output values: {sample_output[0]}\")\n", " \n", " train_losses, val_losses, val_accuracies = train_model(model, train_loader, val_loader, num_epochs=20)\n", " \n", " test_accuracy, predictions, targets, probabilities = test_model(model, test_loader)\n", " \n", " sample_texts = [\n", " \"This movie is absolutely fantastic and amazing! I loved every minute of it.\",\n", " \"Terrible boring film, complete waste of time. I hated everything about it.\",\n", " \"Excellent story with wonderful acting and brilliant performance throughout.\",\n", " \"Awful movie with horrible dialogue. Disappointed and would not recommend.\",\n", " \"The cinematography was superb and the plot was incredible. Highly recommend!\",\n", " \"Poor script with terrible acting. One of the worst films I have ever seen.\"\n", " ]\n", " \n", " demonstrate_predictions(model, vocab_to_idx, sample_texts)\n", " \n", " plt.figure(figsize=(15, 5))\n", " \n", " plt.subplot(1, 3, 1)\n", " plt.plot(train_losses, label='Train Loss', color='blue')\n", " plt.plot(val_losses, label='Validation Loss', color='red')\n", " plt.title('Training and Validation Loss')\n", " plt.xlabel('Epoch')\n", " plt.ylabel('Loss')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " plt.subplot(1, 3, 2)\n", " plt.plot(val_accuracies, label='Validation Accuracy', color='green')\n", " plt.title('Validation Accuracy')\n", " plt.xlabel('Epoch')\n", " plt.ylabel('Accuracy (%)')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " plt.subplot(1, 3, 3)\n", " predictions_array = np.array(predictions)\n", " targets_array = np.array(targets)\n", " \n", " from sklearn.metrics import confusion_matrix\n", " cm = confusion_matrix(targets_array, predictions_array)\n", " sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', \n", " xticklabels=['Negative', 'Positive'], \n", " yticklabels=['Negative', 'Positive'])\n", " plt.title('Confusion Matrix')\n", " plt.ylabel('True Label')\n", " plt.xlabel('Predicted Label')\n", " \n", " plt.tight_layout()\n", " plt.show()\n", " \n", " logger.info(\"Implementation completed successfully!\")\n", "\n", "if __name__ == \"__main__\":\n", " main()" ] }, { "cell_type": "code", "execution_count": null, "id": "270376ad-0312-45b2-9a38-a8adb96ba80a", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.utils.data import DataLoader, Dataset\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import LabelEncoder\n", "from sklearn.metrics import accuracy_score, classification_report\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from collections import Counter\n", "import random\n", "\n", "# Set up device detection for optimal performance across different hardware\n", "# Priority: Apple Silicon MPS > NVIDIA CUDA > CPU fallback\n", "device = torch.device('mps' if torch.backends.mps.is_available() else 'cuda' if torch.cuda.is_available() else 'cpu')\n", "\n", "class TextDataset(Dataset):\n", " \"\"\"\n", " Custom PyTorch Dataset class for text classification tasks.\n", " Handles text preprocessing, tokenization, and sequence padding/truncation.\n", " \"\"\"\n", " def __init__(self, texts, labels, vocab_to_idx, max_length=50):\n", " \"\"\"\n", " Initialize the dataset with text data and preprocessing parameters.\n", " \n", " Args:\n", " texts (list): List of text strings for classification\n", " labels (list): Corresponding integer labels (0 for negative, 1 for positive)\n", " vocab_to_idx (dict): Vocabulary mapping from words to integer indices\n", " max_length (int): Fixed sequence length for batching (pad short, truncate long)\n", " \"\"\"\n", " self.texts = texts\n", " self.labels = labels\n", " self.vocab_to_idx = vocab_to_idx # Word-to-index mapping for tokenization\n", " self.max_length = max_length # Standardize all sequences to this length\n", " \n", " def __len__(self):\n", " \"\"\"Return total number of samples in the dataset.\"\"\"\n", " return len(self.texts)\n", " \n", " def __getitem__(self, idx):\n", " \"\"\"\n", " Retrieve and preprocess a single sample from the dataset.\n", " \n", " This method:\n", " 1. Gets the text and label at the specified index\n", " 2. Tokenizes the text (splits on whitespace, converts to lowercase)\n", " 3. Maps each token to its vocabulary index (uses for unknown words)\n", " 4. Applies padding (with ) or truncation to reach max_length\n", " 5. Returns PyTorch tensors ready for model input\n", " \n", " Args:\n", " idx (int): Index of the sample to retrieve\n", " \n", " Returns:\n", " tuple: (token_ids_tensor, label_tensor) both as LongTensors\n", " \"\"\"\n", " text = self.texts[idx]\n", " label = self.labels[idx]\n", " \n", " # Tokenization: split text into individual words and normalize case\n", " tokens = text.lower().split()\n", " \n", " # Convert tokens to vocabulary indices\n", " # get() method returns vocab_to_idx[''] if token not found in vocabulary\n", " # This handles out-of-vocabulary words gracefully\n", " token_ids = [self.vocab_to_idx.get(token, self.vocab_to_idx['']) for token in tokens]\n", " \n", " # Sequence length normalization for efficient batching\n", " if len(token_ids) < self.max_length:\n", " # Pad short sequences with tokens (index 0)\n", " # This ensures all sequences in a batch have the same length\n", " token_ids.extend([self.vocab_to_idx['']] * (self.max_length - len(token_ids)))\n", " else:\n", " # Truncate long sequences to max_length\n", " # This prevents memory issues and maintains consistent input size\n", " token_ids = token_ids[:self.max_length]\n", " \n", " # Convert to PyTorch tensors with appropriate data types\n", " # LongTensor is required for embedding layer indices\n", " return torch.tensor(token_ids, dtype=torch.long), torch.tensor(label, dtype=torch.long)\n", "\n", "class LSTMClassifier(nn.Module):\n", " \"\"\"\n", " Bidirectional LSTM classifier for text sentiment analysis.\n", " \n", " Architecture:\n", " Input -> Embedding -> Bidirectional LSTM -> Dropout -> Linear -> Output\n", " \n", " The bidirectional design allows the model to see context from both\n", " past and future words, improving understanding of sentiment patterns.\n", " \"\"\"\n", " def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers, num_classes, dropout=0.3):\n", " \"\"\"\n", " Initialize the LSTM classifier architecture.\n", " \n", " Args:\n", " vocab_size (int): Size of vocabulary (number of unique words + special tokens)\n", " embedding_dim (int): Dimension of word embeddings (feature size per word)\n", " hidden_dim (int): Dimension of LSTM hidden states (memory capacity)\n", " num_layers (int): Number of stacked LSTM layers (network depth)\n", " num_classes (int): Number of output classes (2 for binary sentiment)\n", " dropout (float): Dropout probability for regularization (0.0-1.0)\n", " \"\"\"\n", " super(LSTMClassifier, self).__init__()\n", " self.hidden_dim = hidden_dim\n", " self.num_layers = num_layers\n", " \n", " # Embedding layer: converts sparse word indices to dense vector representations\n", " # padding_idx=0 ensures tokens have zero embeddings (ignored in computation)\n", " # This layer learns semantic relationships between words during training\n", " self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)\n", " \n", " # Bidirectional LSTM: processes sequences in both forward and backward directions\n", " # batch_first=True: expects input shape (batch_size, sequence_length, features)\n", " # dropout: applied between LSTM layers (not after final layer)\n", " # bidirectional=True: doubles output dimension (forward + backward hidden states)\n", " self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, \n", " batch_first=True, dropout=dropout, bidirectional=True)\n", " \n", " # Dropout layer: randomly zeros elements during training to prevent overfitting\n", " # Only active during training, disabled during evaluation\n", " self.dropout = nn.Dropout(dropout)\n", " \n", " # Final classification layer: maps LSTM output to class probabilities\n", " # Input dimension is hidden_dim * 2 due to bidirectional LSTM\n", " # Output dimension equals number of classes\n", " self.fc = nn.Linear(hidden_dim * 2, num_classes)\n", " \n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass through the network.\n", " \n", " Processing flow:\n", " 1. Convert word indices to embeddings\n", " 2. Initialize LSTM hidden and cell states\n", " 3. Process sequence through bidirectional LSTM\n", " 4. Extract final time step output (contains full sequence information)\n", " 5. Apply dropout for regularization\n", " 6. Generate class logits through linear layer\n", " \n", " Args:\n", " x (torch.Tensor): Input tensor of word indices, shape (batch_size, sequence_length)\n", " \n", " Returns:\n", " torch.Tensor: Class logits, shape (batch_size, num_classes)\n", " \"\"\"\n", " batch_size = x.size(0)\n", " \n", " # Convert word indices to dense embeddings\n", " # Shape: (batch_size, sequence_length) -> (batch_size, sequence_length, embedding_dim)\n", " embedded = self.embedding(x)\n", " \n", " # Initialize LSTM hidden and cell states with zeros\n", " # Shape: (num_layers * num_directions, batch_size, hidden_dim)\n", " # num_directions = 2 for bidirectional LSTM\n", " # Zero initialization is standard practice for sequence modeling\n", " h0 = torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(x.device)\n", " c0 = torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(x.device)\n", " \n", " # Process sequence through bidirectional LSTM\n", " # lstm_out: output at each time step, shape (batch_size, sequence_length, hidden_dim * 2)\n", " # hidden: final hidden state (not used for classification)\n", " # _: final cell state (discarded)\n", " lstm_out, (hidden, _) = self.lstm(embedded, (h0, c0))\n", " \n", " # Extract final time step output for sequence classification\n", " # [:, -1, :] selects the last time step, which contains information from entire sequence\n", " # Shape: (batch_size, sequence_length, hidden_dim * 2) -> (batch_size, hidden_dim * 2)\n", " output = self.dropout(lstm_out[:, -1, :])\n", " \n", " # Generate final class logits\n", " # Shape: (batch_size, hidden_dim * 2) -> (batch_size, num_classes)\n", " # Logits will be converted to probabilities using softmax during inference\n", " output = self.fc(output)\n", " \n", " return output\n", "\n", "def create_realistic_movie_dataset(num_samples=2000):\n", " \"\"\"\n", " Generate a synthetic movie review dataset with realistic sentiment patterns.\n", " \n", " This function creates varied movie reviews using template-based generation\n", " to ensure clear sentiment distinctions that the model can learn from.\n", " \n", " Design principles:\n", " - Use template sentences with placeholders for sentiment words\n", " - Maintain consistent positive/negative word associations\n", " - Generate varied sentence structures to prevent overfitting\n", " - Include movie-specific vocabulary (acting, plot, dialogue, etc.)\n", " \n", " Args:\n", " num_samples (int): Total number of reviews to generate (split equally between classes)\n", " \n", " Returns:\n", " tuple: (texts, labels) where texts is list of review strings, labels is list of 0/1 integers\n", " \"\"\"\n", " # Template sentences for positive reviews\n", " # Placeholders {} are filled with sentiment-appropriate words\n", " positive_templates = [\n", " \"This movie is {adj1} and {adj2}. The {element} was {quality}.\",\n", " \"I {loved} this {film_type}! The {element} was {excellent} and the {element2} was {amazing}.\",\n", " \"What a {fantastic} {film_type}! {loved} every minute of it. The {element} was {brilliant}.\",\n", " \"{excellent} {film_type} with {outstanding} {element}. Highly {recommend} it!\",\n", " \"The {element} in this movie was {superb}. {loved} the {element2} and {element3}.\",\n", " \"This is one of the {best} movies I have ever seen. {brilliant} {element} and {amazing} {element2}.\",\n", " \"I was {amazed} by this {film_type}. The {element} was {perfect} and {element2} was {incredible}.\",\n", " \"{wonderful} story with {excellent} {element}. {loved} everything about it.\",\n", " ]\n", " \n", " # Template sentences for negative reviews\n", " # Similar structure but with negative sentiment words\n", " negative_templates = [\n", " \"This movie is {adj1} and {adj2}. The {element} was {quality}.\",\n", " \"I {hated} this {film_type}! The {element} was {terrible} and the {element2} was {awful}.\",\n", " \"What a {horrible} {film_type}! {wasted} my time. The {element} was {pathetic}.\",\n", " \"{terrible} {film_type} with {awful} {element}. Do not {recommend} it!\",\n", " \"The {element} in this movie was {boring}. {hated} the {element2} and {element3}.\",\n", " \"This is one of the {worst} movies I have ever seen. {terrible} {element} and {horrible} {element2}.\",\n", " \"I was {disappointed} by this {film_type}. The {element} was {bad} and {element2} was {ridiculous}.\",\n", " \"{disappointing} story with {poor} {element}. {hated} everything about it.\",\n", " ]\n", " \n", " # Positive sentiment word dictionaries\n", " # Each key corresponds to a placeholder in templates\n", " # Multiple options for each placeholder create vocabulary diversity\n", " positive_words = {\n", " 'adj1': ['amazing', 'fantastic', 'brilliant', 'excellent', 'wonderful'],\n", " 'adj2': ['outstanding', 'superb', 'incredible', 'magnificent', 'marvelous'],\n", " 'quality': ['excellent', 'brilliant', 'amazing', 'fantastic', 'superb'],\n", " 'loved': ['loved', 'adored', 'enjoyed'],\n", " 'excellent': ['excellent', 'brilliant', 'amazing'],\n", " 'amazing': ['amazing', 'fantastic', 'incredible'],\n", " 'brilliant': ['brilliant', 'superb', 'outstanding'],\n", " 'fantastic': ['fantastic', 'wonderful', 'marvelous'],\n", " 'outstanding': ['outstanding', 'exceptional', 'remarkable'],\n", " 'superb': ['superb', 'magnificent', 'splendid'],\n", " 'recommend': ['recommend', 'suggest'],\n", " 'best': ['best', 'greatest', 'finest'],\n", " 'perfect': ['perfect', 'flawless', 'ideal'],\n", " 'incredible': ['incredible', 'unbelievable', 'amazing'],\n", " 'wonderful': ['wonderful', 'delightful', 'lovely'],\n", " 'amazed': ['amazed', 'impressed', 'stunned']\n", " }\n", " \n", " # Negative sentiment word dictionaries\n", " # Parallel structure to positive words but with opposite sentiment\n", " negative_words = {\n", " 'adj1': ['terrible', 'awful', 'horrible', 'bad', 'disappointing'],\n", " 'adj2': ['boring', 'stupid', 'ridiculous', 'pathetic', 'useless'],\n", " 'quality': ['terrible', 'awful', 'horrible', 'bad', 'disappointing'],\n", " 'hated': ['hated', 'despised', 'disliked'],\n", " 'terrible': ['terrible', 'awful', 'horrible'],\n", " 'awful': ['awful', 'dreadful', 'atrocious'],\n", " 'horrible': ['horrible', 'disgusting', 'repulsive'],\n", " 'pathetic': ['pathetic', 'pitiful', 'miserable'],\n", " 'boring': ['boring', 'dull', 'tedious'],\n", " 'recommend': ['recommend', 'suggest'],\n", " 'worst': ['worst', 'poorest', 'most terrible'],\n", " 'bad': ['bad', 'poor', 'weak'],\n", " 'ridiculous': ['ridiculous', 'absurd', 'nonsensical'],\n", " 'disappointed': ['disappointed', 'let down', 'frustrated'],\n", " 'disappointing': ['disappointing', 'unsatisfying', 'mediocre'],\n", " 'poor': ['poor', 'weak', 'inadequate'],\n", " 'wasted': ['wasted', 'lost']\n", " }\n", " \n", " # Movie-specific vocabulary for realistic content\n", " # These words appear in both positive and negative reviews\n", " elements = ['acting', 'plot', 'story', 'dialogue', 'cinematography', 'direction', 'script', 'characters', 'soundtrack', 'ending']\n", " film_types = ['movie', 'film', 'picture', 'drama', 'thriller', 'comedy']\n", " \n", " texts = []\n", " labels = []\n", " \n", " # Generate equal numbers of positive and negative samples\n", " for i in range(num_samples):\n", " if i < num_samples // 2:\n", " # Generate positive review\n", " template = random.choice(positive_templates)\n", " word_dict = positive_words\n", " label = 1 # Positive class\n", " else:\n", " # Generate negative review\n", " template = random.choice(negative_templates)\n", " word_dict = negative_words\n", " label = 0 # Negative class\n", " \n", " text = template\n", " \n", " # Replace sentiment placeholders with appropriate words\n", " for key in word_dict:\n", " if '{' + key + '}' in text:\n", " text = text.replace('{' + key + '}', random.choice(word_dict[key]))\n", " \n", " # Replace movie element placeholders\n", " # Multiple element replacements create sentence variety\n", " text = text.replace('{element}', random.choice(elements))\n", " text = text.replace('{element2}', random.choice(elements))\n", " text = text.replace('{element3}', random.choice(elements))\n", " text = text.replace('{film_type}', random.choice(film_types))\n", " \n", " # Handle any remaining placeholders\n", " # This ensures no template placeholders remain in final text\n", " remaining_placeholders = [word for word in text.split() if word.startswith('{') and word.endswith('}')]\n", " for placeholder in remaining_placeholders:\n", " clean_placeholder = placeholder.strip('{}')\n", " if clean_placeholder in positive_words:\n", " text = text.replace(placeholder, random.choice(positive_words[clean_placeholder]))\n", " elif clean_placeholder in negative_words:\n", " text = text.replace(placeholder, random.choice(negative_words[clean_placeholder]))\n", " \n", " texts.append(text)\n", " labels.append(label)\n", " \n", " return texts, labels\n", "\n", "def build_vocabulary(texts, min_freq=2):\n", " \"\"\"\n", " Build vocabulary from training texts with frequency-based filtering.\n", " \n", " This function creates a word-to-index mapping that enables efficient\n", " text processing and handles the vocabulary size vs. coverage trade-off.\n", " \n", " Process:\n", " 1. Count frequency of each word across all texts\n", " 2. Filter out rare words below min_freq threshold\n", " 3. Assign unique indices to remaining words\n", " 4. Add special tokens for padding and unknown words\n", " \n", " Args:\n", " texts (list): List of text strings to analyze\n", " min_freq (int): Minimum frequency threshold for including words\n", " \n", " Returns:\n", " dict: Vocabulary mapping from words to integer indices\n", " \"\"\"\n", " # Count word frequencies across entire corpus\n", " word_counts = Counter()\n", " for text in texts:\n", " words = text.lower().split() # Normalize case for consistency\n", " word_counts.update(words)\n", " \n", " # Initialize vocabulary with special tokens\n", " # Index 0: for sequence padding (shorter sequences)\n", " # Index 1: for unknown words (not in training vocabulary)\n", " # These special tokens are essential for robust text processing\n", " vocab_to_idx = {'': 0, '': 1}\n", " idx = 2\n", " \n", " # Add frequent words to vocabulary\n", " # min_freq filtering removes rare words that might not generalize well\n", " # This reduces vocabulary size while maintaining coverage of important words\n", " for word, count in word_counts.items():\n", " if count >= min_freq:\n", " vocab_to_idx[word] = idx\n", " idx += 1\n", " \n", " return vocab_to_idx\n", "\n", "def train_model(model, train_loader, val_loader, num_epochs=20):\n", " \"\"\"\n", " Train the LSTM model with comprehensive optimization techniques.\n", " \n", " Training features:\n", " - Cross-entropy loss for classification\n", " - Adam optimizer with weight decay regularization\n", " - Learning rate scheduling based on validation performance\n", " - Early stopping to prevent overfitting\n", " - Gradient clipping to handle exploding gradients\n", " - Comprehensive metric tracking\n", " \n", " Args:\n", " model: PyTorch model to train\n", " train_loader: DataLoader for training data\n", " val_loader: DataLoader for validation data\n", " num_epochs: Maximum number of training epochs\n", " \n", " Returns:\n", " tuple: (train_losses, val_losses, val_accuracies) for analysis\n", " \"\"\"\n", " # Loss function: Cross-entropy automatically applies softmax and computes negative log-likelihood\n", " # Ideal for multi-class classification problems\n", " criterion = nn.CrossEntropyLoss()\n", " \n", " # Optimizer: Adam combines momentum with adaptive learning rates\n", " # weight_decay: L2 regularization penalty to prevent overfitting\n", " optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)\n", " \n", " # Learning rate scheduler: reduces LR when validation loss plateaus\n", " # patience=3: wait 3 epochs before reducing, factor=0.5: halve the learning rate\n", " # This helps fine-tune the model when learning slows down\n", " scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3, factor=0.5)\n", " \n", " # Metric tracking for training analysis\n", " train_losses = []\n", " val_losses = []\n", " val_accuracies = []\n", " \n", " # Early stopping parameters to prevent overfitting\n", " best_val_acc = 0 # Track best validation accuracy\n", " patience = 5 # Number of epochs to wait without improvement\n", " patience_counter = 0 # Current count of epochs without improvement\n", " \n", " for epoch in range(num_epochs):\n", " # Training phase: model learns from training data\n", " model.train() # Enable dropout and batch normalization training behavior\n", " total_train_loss = 0\n", " train_correct = 0\n", " train_total = 0\n", " \n", " for batch_idx, (data, target) in enumerate(train_loader):\n", " # Move data to computation device (GPU/MPS/CPU)\n", " data, target = data.to(device), target.to(device)\n", " \n", " # Clear gradients from previous iteration\n", " # PyTorch accumulates gradients by default, so this is essential\n", " optimizer.zero_grad()\n", " \n", " # Forward pass: compute model predictions\n", " output = model(data)\n", " \n", " # Compute loss between predictions and true labels\n", " loss = criterion(output, target)\n", " \n", " # Backward pass: compute gradients using backpropagation\n", " loss.backward()\n", " \n", " # Gradient clipping: prevent exploding gradients common in RNNs\n", " # Clips the norm of gradients to maximum value of 1.0\n", " torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)\n", " \n", " # Update model parameters using computed gradients\n", " optimizer.step()\n", " \n", " # Track training metrics\n", " total_train_loss += loss.item()\n", " _, predicted = torch.max(output.data, 1)\n", " train_total += target.size(0)\n", " train_correct += (predicted == target).sum().item()\n", " \n", " # Calculate epoch training metrics\n", " avg_train_loss = total_train_loss / len(train_loader)\n", " train_accuracy = 100 * train_correct / train_total\n", " train_losses.append(avg_train_loss)\n", " \n", " # Validation phase: evaluate model on unseen data\n", " model.eval() # Disable dropout and batch normalization updates\n", " total_val_loss = 0\n", " correct = 0\n", " total = 0\n", " \n", " # Disable gradient computation for efficiency during validation\n", " with torch.no_grad():\n", " for data, target in val_loader:\n", " data, target = data.to(device), target.to(device)\n", " output = model(data)\n", " loss = criterion(output, target)\n", " total_val_loss += loss.item()\n", " \n", " # Calculate accuracy: find class with highest probability\n", " _, predicted = torch.max(output.data, 1)\n", " total += target.size(0)\n", " correct += (predicted == target).sum().item()\n", " \n", " # Calculate validation metrics\n", " avg_val_loss = total_val_loss / len(val_loader)\n", " val_accuracy = 100 * correct / total\n", " \n", " # Store metrics for analysis\n", " val_losses.append(avg_val_loss)\n", " val_accuracies.append(val_accuracy)\n", " \n", " # Update learning rate based on validation performance\n", " scheduler.step(avg_val_loss)\n", " current_lr = optimizer.param_groups[0]['lr']\n", " \n", " # Early stopping logic: stop training if no improvement\n", " if val_accuracy > best_val_acc:\n", " best_val_acc = val_accuracy\n", " patience_counter = 0 # Reset counter on improvement\n", " else:\n", " patience_counter += 1 # Increment counter when no improvement\n", " \n", " # Stop training if no improvement for patience epochs\n", " if patience_counter >= patience:\n", " break\n", " \n", " return train_losses, val_losses, val_accuracies\n", "\n", "def test_model(model, test_loader):\n", " \"\"\"\n", " Evaluate trained model on test set with comprehensive metrics.\n", " \n", " Provides detailed performance analysis including:\n", " - Overall accuracy\n", " - Per-class precision, recall, F1-score\n", " - Prediction probabilities for confidence analysis\n", " - Classification report with support counts\n", " \n", " Args:\n", " model: Trained PyTorch model\n", " test_loader: DataLoader for test data\n", " \n", " Returns:\n", " tuple: (accuracy, predictions, targets, probabilities)\n", " \"\"\"\n", " model.eval() # Set model to evaluation mode\n", " all_predictions = []\n", " all_targets = []\n", " all_probabilities = []\n", " \n", " # Collect predictions and probabilities for all test samples\n", " with torch.no_grad():\n", " for data, target in test_loader:\n", " data, target = data.to(device), target.to(device)\n", " output = model(data)\n", " \n", " # Convert logits to probabilities using softmax\n", " probabilities = torch.softmax(output, dim=1)\n", " \n", " # Get predicted class (highest probability)\n", " _, predicted = torch.max(output, 1)\n", " \n", " # Store results for metric calculation\n", " all_predictions.extend(predicted.cpu().numpy())\n", " all_targets.extend(target.cpu().numpy())\n", " all_probabilities.extend(probabilities.cpu().numpy())\n", " \n", " # Calculate overall accuracy\n", " accuracy = accuracy_score(all_targets, all_predictions)\n", " \n", " # Generate detailed classification report\n", " # zero_division=0 handles edge case where no samples predicted for a class\n", " try:\n", " report = classification_report(all_targets, all_predictions, \n", " target_names=['Negative', 'Positive'], \n", " zero_division=0)\n", " except Exception as e:\n", " report = f\"Error generating classification report: {e}\"\n", " \n", " return accuracy, all_predictions, all_targets, all_probabilities\n", "\n", "def demonstrate_predictions(model, vocab_to_idx, sample_texts):\n", " \"\"\"\n", " Demonstrate model predictions on sample texts to verify functionality.\n", " \n", " This function shows:\n", " - Text preprocessing pipeline in action\n", " - Model inference process\n", " - Confidence scores and probability distributions\n", " - Prediction interpretability\n", " \n", " Args:\n", " model: Trained PyTorch model\n", " vocab_to_idx: Vocabulary mapping for text preprocessing\n", " sample_texts: List of example texts to classify\n", " \"\"\"\n", " model.eval() # Ensure model is in evaluation mode\n", " max_length = 50 # Must match training sequence length\n", " \n", " with torch.no_grad():\n", " for text in sample_texts:\n", " # Preprocess text using same pipeline as training\n", " tokens = text.lower().split()\n", " \n", " # Convert words to vocabulary indices\n", " # Uses token for words not seen during training\n", " token_ids = [vocab_to_idx.get(token, vocab_to_idx['']) for token in tokens]\n", " \n", " # Apply same padding/truncation as training data\n", " if len(token_ids) < max_length:\n", " # Pad with tokens to reach max_length\n", " token_ids.extend([vocab_to_idx['']] * (max_length - len(token_ids)))\n", " else:\n", " # Truncate to max_length\n", " token_ids = token_ids[:max_length]\n", " \n", " # Convert to tensor and add batch dimension for model input\n", " input_tensor = torch.tensor([token_ids], dtype=torch.long).to(device)\n", " \n", " # Get model predictions (raw logits)\n", " output = model(input_tensor)\n", " \n", " # Convert logits to probabilities using softmax\n", " probabilities = torch.softmax(output, dim=1)\n", " \n", " # Extract prediction and confidence\n", " predicted_class = torch.argmax(output, dim=1).item()\n", " confidence = probabilities[0][predicted_class].item()\n", " \n", " # Convert numerical prediction to human-readable label\n", " sentiment = \"Positive\" if predicted_class == 1 else \"Negative\"\n", "\n", "def main():\n", " \"\"\"\n", " Main execution function orchestrating the complete sequence modeling pipeline.\n", " \n", " Pipeline overview:\n", " 1. Set random seeds for reproducible experiments\n", " 2. Generate realistic movie review dataset\n", " 3. Build vocabulary from training texts\n", " 4. Create train/validation/test splits with stratification\n", " 5. Initialize PyTorch datasets and data loaders\n", " 6. Define and initialize LSTM model architecture\n", " 7. Train model with validation monitoring\n", " 8. Evaluate on test set with comprehensive metrics\n", " 9. Demonstrate predictions on sample texts\n", " 10. Visualize training progress and performance\n", " \"\"\"\n", " # Set random seeds for reproducible results across runs\n", " # Essential for comparing different model configurations\n", " torch.manual_seed(42)\n", " np.random.seed(42)\n", " random.seed(42)\n", " \n", " # Generate synthetic movie review dataset\n", " # Larger dataset (2000 samples) provides more training examples\n", " texts, labels = create_realistic_movie_dataset(num_samples=2000)\n", " \n", " # Build vocabulary from all text data\n", " # min_freq=2 filters very rare words that might not generalize\n", " vocab_to_idx = build_vocabulary(texts, min_freq=2)\n", " \n", " # Create stratified train/validation/test splits\n", " # Stratification maintains class balance across all splits\n", " # 60% train, 20% validation, 20% test\n", " X_temp, X_test, y_temp, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42, stratify=labels)\n", " X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp)\n", " \n", " # Create PyTorch datasets with text preprocessing\n", " train_dataset = TextDataset(X_train, y_train, vocab_to_idx)\n", " val_dataset = TextDataset(X_val, y_val, vocab_to_idx)\n", " test_dataset = TextDataset(X_test, y_test, vocab_to_idx)\n", " \n", " # Create data loaders for efficient batch processing\n", " # batch_size=32: balance between memory usage and gradient stability\n", " # shuffle=True for training: randomizes order to improve learning\n", " # shuffle=False for validation/test: ensures consistent evaluation\n", " train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n", " val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)\n", " test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)\n", " \n", " # Define model architecture hyperparameters\n", " vocab_size = len(vocab_to_idx) # Size of vocabulary (including special tokens)\n", " embedding_dim = 128 # Word embedding dimension (semantic feature size)\n", " hidden_dim = 64 # LSTM hidden state dimension (memory capacity)\n", " num_layers = 2 # Number of stacked LSTM layers (model depth)\n", " num_classes = 2 # Binary classification (positive/negative sentiment)\n", " \n", " # Initialize model and move to computation device\n", " model = LSTMClassifier(vocab_size, embedding_dim, hidden_dim, num_layers, num_classes, dropout=0.2)\n", " model.to(device)\n", " \n", " # Verify model architecture with sample input\n", " sample_batch = next(iter(train_loader))\n", " sample_input, sample_target = sample_batch\n", " sample_input = sample_input.to(device)\n", " \n", " with torch.no_grad():\n", " sample_output = model(sample_input)\n", " # Verify output shape matches expected dimensions\n", " \n", " # Train model with validation monitoring\n", " train_losses, val_losses, val_accuracies = train_model(model, train_loader, val_loader, num_epochs=20)\n", " \n", " # Evaluate final performance on test set\n", " test_accuracy, predictions, targets, probabilities = test_model(model, test_loader)\n", " \n", " # Demonstrate predictions on hand-crafted examples\n", " # These examples test model's ability to distinguish clear sentiment patterns\n", " sample_texts = [\n", " \"This movie is absolutely fantastic and amazing! I loved every minute of it.\",\n", " \"Terrible boring film, complete waste of time. I hated everything about it.\",\n", " \"Excellent story with wonderful acting and brilliant performance throughout.\",\n", " \"Awful movie with horrible dialogue. Disappointed and would not recommend.\",\n", " \"The cinematography was superb and the plot was incredible. Highly recommend!\",\n", " \"Poor script with terrible acting. One of the worst films I have ever seen.\"\n", " ]\n", " \n", " demonstrate_predictions(model, vocab_to_idx, sample_texts)\n", " \n", " # Visualize training progress and final performance\n", " plt.figure(figsize=(15, 5))\n", " \n", " # Training and validation loss curves\n", " plt.subplot(1, 3, 1)\n", " plt.plot(train_losses, label='Train Loss', color='blue')\n", " plt.plot(val_losses, label='Validation Loss', color='red')\n", " plt.title('Training and Validation Loss')\n", " plt.xlabel('Epoch')\n", " plt.ylabel('Loss')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " # Validation accuracy curve\n", " plt.subplot(1, 3, 2)\n", " plt.plot(val_accuracies, label='Validation Accuracy', color='green')\n", " plt.title('Validation Accuracy')\n", " plt.xlabel('Epoch')\n", " plt.ylabel('Accuracy (%)')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " # Confusion matrix for final test performance\n", " plt.subplot(1, 3, 3)\n", " predictions_array = np.array(predictions)\n", " targets_array = np.array(targets)\n", " \n", " from sklearn.metrics import confusion_matrix\n", " # Create confusion matrix to visualize classification performance\n", " # Shows true positives, false positives, true negatives, false negatives\n", " cm = confusion_matrix(targets_array, predictions_array)\n", " sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', \n", " xticklabels=['Negative', 'Positive'], \n", " yticklabels=['Negative', 'Positive'])\n", " plt.title('Confusion Matrix')\n", " plt.ylabel('True Label')\n", " plt.xlabel('Predicted Label')\n", " \n", " plt.tight_layout()\n", " plt.show()\n", " \n", " if __name__ == \"__main__\":\n", " main()" ] }, { "cell_type": "code", "execution_count": null, "id": "578b4d5b-43d9-4b88-a400-a15c1a4e8e8f", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }