feat: add new model version alongside docs

Browse files

Files changed (4) hide show

README.md +65 -4
sam-artifacts/config.json +5 -3
sam-artifacts/model.mpk +2 -2
trainer/src/training.rs +2 -2

README.md CHANGED Viewed

@@ -9,7 +9,68 @@ base_model:
 ---
 # Sentiment Analysis Model (SAM)
-## Technologies Used
-- Rust
-- Burn
-- Rocket

 ---
 # Sentiment Analysis Model (SAM)
+A sentiment analysis model built using the [Burn](https://burn.dev/) deep learning framework in Rust, fine-tuned on the [MTEB Tweet Sentiment Extraction](https://huggingface.co/WarriorsSami/sentiment-analysis-model/tree/main#:~:text=tweet_sentiment_extraction) dataset and exposed via a [Rocket](https://rocket.rs/guide/v0.5/introduction/#introduction) API.
+## 🧠 Model Details
+- **Architecture**: Transformer Encoder with 6 layers, 4 attention heads, d_model=256, and d_ff=1024.
+- **Embeddings**: Token and positional embeddings with a maximum sequence length of 256.
+- **Output Layer**: Linear layer mapping to 3 sentiment classes: Negative, Neutral, Positive.
+- **Activation Function**: Softmax for multi-class classification.
+- **Dropout**: Applied with a rate of 0.1 to prevent overfitting (one for embeddings and one for the output layer).
+- **Training Framework**: Burn in Rust.
+## 📚 Training Data
+- **Dataset**: MTEB Tweet Sentiment Extraction
+- **Size**: 100,000 training samples.
+- **Preprocessing**: Utilized the BertCasedTokenizer for tokenization.
+- **Batching**: Mini-batch gradient descent with a batch size of 32.
+## ⚙️ Training Configuration
+- **Optimizer**: AdamW with weight decay (0.01) and learning rate (1e-4) - especially good for training large models.
+- **Learning Rate Scheduler**: Noam scheduler with 5,000 warm-up steps - especially useful for transformer models.
+- **Loss Function**: CrossEntropyLoss with label smoothing (0.1) and class balancing.
+- **Gradient Clipping**: Applied with a maximum norm of 1.0.
+- **Early Stopping**: Implemented with a patience of 2 epochs.
+- **Epochs**: Trained for up to 5 epochs with early stopping based on validation loss.
+## 📈 Evaluation Metrics
+- **Learner Summary**:
+```js
+TextClassificationModel {
+  transformer: TransformerEncoder {d_model: 256, d_ff: 1024, n_heads: 8, n_layers: 4, dropout: 0.1, norm_first: true, quiet_softmax: true, params: 3159040}
+  embedding_token: Embedding {n_embedding: 28996, d_model: 256, params: 7422976}
+  embedding_pos: Embedding {n_embedding: 256, d_model: 256, params: 65536}
+  embed_dropout: Dropout {prob: 0.1}
+  output_dropout: Dropout {prob: 0.1}
+  output: Linear {d_input: 256, d_output: 3, bias: true, params: 771}
+  n_classes: 3
+  max_seq_length: 256
+  params: 10648323
+}
+```
+| Split | Metric        | Min.     | Epoch    | Max.     | Epoch    |
+|-------|---------------|----------|----------|----------|----------|
+| Train | Loss          | 1.120    | 5        | 1.171    | 1        |
+| Train | Accuracy      | 33.743   | 2        | 37.814   | 1        |
+| Train | Learning Rate | 2.763e-8 | 1        | 7.648e-8 | 2        |
+| Valid | Loss          | 1.102    | 4        | 1.110    | 1        |
+| Valid | Accuracy      | 32.760   | 2        | 36.900   | 5        |
+- **TODO**:
+  - Tweak hyperparameters to alleviate underfitting.
+  - Enhance logging and monitoring.
+## 🚀 Usage
+- **API Endpoint**: `/predict`
+- **Example Request**:
+```json
+{
+  "text": "I love the new features in this app!"
+}
+```
+- **Example Response**:
+```json
+{
+  "sentiment": "Positive"
+}
+```
+- **Steps to Run**: *TODO* after dockerizing and deploying to Hugging Face Spaces.

sam-artifacts/config.json CHANGED Viewed

@@ -16,12 +16,14 @@
   },
   "optimizer": {
     "weight_decay": {
-      "penalty": 0.00005
     },
-    "grad_clipping": null,
     "beta_1": 0.9,
     "beta_2": 0.999,
-    "epsilon": 0.00001
   },
   "max_seq_length": 256,
   "batch_size": 32,

   },
   "optimizer": {
     "weight_decay": {
+      "penalty": 0.01
+    },
+    "grad_clipping": {
+      "Norm": 1.0
     },
     "beta_1": 0.9,
     "beta_2": 0.999,
+    "epsilon": 1e-8
   },
   "max_seq_length": 256,
   "batch_size": 32,

sam-artifacts/model.mpk CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c1d72b3b82ff9c868352a6f30e31e45e339ab9ae93c6e4ed38684ddd385bddd9
-size 21302041

 version https://git-lfs.github.com/spec/v1
+oid sha256:0f2e7709ec07fca095c996c0d7bd73e59974fb892d567ae00490c44e4e17efc8
+size 21302072

trainer/src/training.rs CHANGED Viewed

@@ -42,7 +42,7 @@ pub fn train<B: AutodiffBackend, D: TextClassificationDataset + 'static>(
     let batcher = TextClassificationBatcher::new(tokenizer.clone(), config.max_seq_length);
     // Create data samplers for training and testing datasets
-    let train_sampler = SamplerDataset::new(dataset_train, 50_000);
     let test_sampler = SamplerDataset::new(dataset_test, 5_000);
     // Initialize model
@@ -69,7 +69,7 @@ pub fn train<B: AutodiffBackend, D: TextClassificationDataset + 'static>(
     // Initialize learning rate scheduler
     let lr_scheduler = NoamLrSchedulerConfig::new(1e-4)
-        .with_warmup_steps(8_000)
         .with_model_size(config.transformer.d_model)
         .init()
         .unwrap();

     let batcher = TextClassificationBatcher::new(tokenizer.clone(), config.max_seq_length);
     // Create data samplers for training and testing datasets
+    let train_sampler = SamplerDataset::new(dataset_train, 100_000);
     let test_sampler = SamplerDataset::new(dataset_test, 5_000);
     // Initialize model
     // Initialize learning rate scheduler
     let lr_scheduler = NoamLrSchedulerConfig::new(1e-4)
+        .with_warmup_steps(5_000)
         .with_model_size(config.transformer.d_model)
         .init()
         .unwrap();