Sentiment Analysis Model (SAM)
A sentiment analysis model built using the Burn deep learning framework in Rust, fine-tuned on the MTEB Tweet Sentiment Extraction dataset and exposed via a Rocket API.
π§ Model Details
- Architecture: Transformer Encoder with 6 layers, 4 attention heads, d_model=256, and d_ff=1024.
- Embeddings: Token and positional embeddings with a maximum sequence length of 256.
- Output Layer: Linear layer mapping to 3 sentiment classes: Negative, Neutral, Positive.
- Activation Function: Softmax for multi-class classification.
- Dropout: Applied with a rate of 0.1 to prevent overfitting (one for embeddings and one for the output layer).
- Training Framework: Burn in Rust.
π Training Data
- Dataset: MTEB Tweet Sentiment Extraction
- Size: 100,000 training samples.
- Preprocessing: Utilized the BertCasedTokenizer for tokenization.
- Batching: Mini-batch gradient descent with a batch size of 32.
βοΈ Training Configuration
- Optimizer: AdamW with weight decay (0.01) and learning rate (1e-4) - especially good for training large models.
- Learning Rate Scheduler: Noam scheduler with 5,000 warm-up steps - especially useful for transformer models.
- Loss Function: CrossEntropyLoss with label smoothing (0.1) and class balancing.
- Gradient Clipping: Applied with a maximum norm of 1.0.
- Early Stopping: Implemented with a patience of 2 epochs.
- Epochs: Trained for up to 5 epochs with early stopping based on validation loss.
π Evaluation Metrics
- Learner Summary:
TextClassificationModel {
transformer: TransformerEncoder {d_model: 256, d_ff: 1024, n_heads: 8, n_layers: 4, dropout: 0.1, norm_first: true, quiet_softmax: true, params: 3159040}
embedding_token: Embedding {n_embedding: 28996, d_model: 256, params: 7422976}
embedding_pos: Embedding {n_embedding: 256, d_model: 256, params: 65536}
embed_dropout: Dropout {prob: 0.1}
output_dropout: Dropout {prob: 0.1}
output: Linear {d_input: 256, d_output: 3, bias: true, params: 771}
n_classes: 3
max_seq_length: 256
params: 10648323
}
Split | Metric | Min. | Epoch | Max. | Epoch |
---|---|---|---|---|---|
Train | Loss | 1.120 | 5 | 1.171 | 1 |
Train | Accuracy | 33.743 | 2 | 37.814 | 1 |
Train | Learning Rate | 2.763e-8 | 1 | 7.648e-8 | 2 |
Valid | Loss | 1.102 | 4 | 1.110 | 1 |
Valid | Accuracy | 32.760 | 2 | 36.900 | 5 |
- TODO:
- Tweak hyperparameters to alleviate underfitting.
- Enhance logging and monitoring.
π Usage
- API Endpoint:
/predict
- Example Request:
{
"text": "I love the new features in this app!"
}
- Example Response:
{
"sentiment": "Positive"
}
- Steps to Run: TODO after dockerizing and deploying to Hugging Face Spaces.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for WarriorsSami/sentiment-analysis-model
Base model
google-bert/bert-base-uncased