license: mit
tags:
- sentiment-analysis
- text-classification
- electra
- pytorch
- transformers
Electra Base Classifier for Sentiment Analysis
This is an ELECTRA base discriminator fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the Sentiment Merged dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
Labels
The model predicts the following labels:
0
: negative1
: neutral2
: positive
How to Use
Install package
This model requires the classes in electra_classifier.py
. You can download the file, or you can install the package from PyPI.
pip install electra-classifier
Load classes and model
# Install the package in a notebook
!pip install electra-classifier
# Import libraries
import torch
from transformers import AutoTokenizer
from electra_classifier import ElectraClassifier
# Load tokenizer and model
model_name = "jbeno/electra-base-classifier-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = ElectraClassifier.from_pretrained(model_name)
# Set model to evaluation mode
model.eval()
# Run inference
text = "I love this restaurant!"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs)
predicted_class_id = torch.argmax(logits, dim=1).item()
predicted_label = model.config.id2label[predicted_class_id]
print(f"Predicted label: {predicted_label}")
Requirements
- Python 3.7+
- PyTorch
- Transformers
- electra-classifier - Install with pip, or download electra_classifier.py
Training Details
Dataset
The model was trained on the Sentiment Merged dataset, which is a mix of Stanford Sentiment Treebank (SST-3), DynaSent Round 1, and DynaSent Round 2.
Code
The code used to train the model can be found on GitHub:
Research Paper
The research paper can be found here: ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
Performance
- Merged Dataset
- Macro Average F1: 79.29
- Accuracy: 79.69
- DynaSent R1
- Macro Average F1: 82.10
- Accuracy: 82.14
- DynaSent R2
- Macro Average F1: 71.83
- Accuracy: 71.94
- SST-3
- Macro Average F1: 69.95
- Accuracy: 78.24
Model Architecture
- Base Model: ELECTRA base discriminator (
google/electra-base-discriminator
) - Pooling Layer: Custom pooling layer supporting 'cls', 'mean', and 'max' pooling types.
- Classifier: Custom classifier with configurable hidden dimensions, number of layers, and dropout rate.
- Activation Function: Custom SwishGLU activation function.
ElectraClassifier(
(electra): ElectraModel(
(embeddings): ElectraEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): ElectraEncoder(
(layer): ModuleList(
(0-11): 12 x ElectraLayer(
(attention): ElectraAttention(
(self): ElectraSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): ElectraSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): ElectraIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): ElectraOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(pooling): PoolingLayer()
(classifier): Classifier(
(layers): Sequential(
(0): Linear(in_features=768, out_features=1024, bias=True)
(1): SwishGLU(
(projection): Linear(in_features=1024, out_features=2048, bias=True)
(activation): SiLU()
)
(2): Dropout(p=0.3, inplace=False)
(3): Linear(in_features=1024, out_features=1024, bias=True)
(4): SwishGLU(
(projection): Linear(in_features=1024, out_features=2048, bias=True)
(activation): SiLU()
)
(5): Dropout(p=0.3, inplace=False)
(6): Linear(in_features=1024, out_features=3, bias=True)
)
)
)
Custom Model Components
SwishGLU Activation Function
The SwishGLU activation function combines the Swish activation with a Gated Linear Unit (GLU). It enhances the model's ability to capture complex patterns in the data.
class SwishGLU(nn.Module):
def __init__(self, input_dim: int, output_dim: int):
super(SwishGLU, self).__init__()
self.projection = nn.Linear(input_dim, 2 * output_dim)
self.activation = nn.SiLU()
def forward(self, x):
x_proj_gate = self.projection(x)
projected, gate = x_proj_gate.tensor_split(2, dim=-1)
return projected * self.activation(gate)
PoolingLayer
The PoolingLayer class allows you to choose between different pooling strategies:
cls
: Uses the representation of the [CLS] token.mean
: Calculates the mean of the token embeddings.max
: Takes the maximum value across token embeddings.
'mean' pooling was used in the fine-tuned model.
class PoolingLayer(nn.Module):
def __init__(self, pooling_type='cls'):
super().__init__()
self.pooling_type = pooling_type
def forward(self, last_hidden_state, attention_mask):
if self.pooling_type == 'cls':
return last_hidden_state[:, 0, :]
elif self.pooling_type == 'mean':
return (last_hidden_state * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(-1).unsqueeze(-1)
elif self.pooling_type == 'max':
return torch.max(last_hidden_state * attention_mask.unsqueeze(-1), dim=1)[0]
else:
raise ValueError(f"Unknown pooling method: {self.pooling_type}")
Classifier
The Classifier class is a customizable feed-forward neural network used for the final classification.
The fine-tuned model had:
input_dim
: 768num_layers
: 2hidden_dim
: 1024hidden_activation
: SwishGLUdropout_rate
: 0.3n_classes
: 3
class Classifier(nn.Module):
def __init__(self, input_dim, hidden_dim, hidden_activation, num_layers, n_classes, dropout_rate=0.0):
super().__init__()
layers = []
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(hidden_activation)
if dropout_rate > 0:
layers.append(nn.Dropout(dropout_rate))
for _ in range(num_layers - 1):
layers.append(nn.Linear(hidden_dim, hidden_dim))
layers.append(hidden_activation)
if dropout_rate > 0:
layers.append(nn.Dropout(dropout_rate))
layers.append(nn.Linear(hidden_dim, n_classes))
self.layers = nn.Sequential(*layers)
Model Configuration
The model's configuration (config.json) includes custom parameters:
hidden_dim
: Size of the hidden layers in the classifier.hidden_activation
: Activation function used in the classifier ('SwishGLU').num_layers
: Number of layers in the classifier.dropout_rate
: Dropout rate used in the classifier.pooling
: Pooling strategy used ('mean').
License
This model is licensed under the MIT License.
Citation
If you use this model in your work, please consider citing it:
@misc{beno-2024-electra_base_classifier_sentiment,
title={Electra Base Classifier for Sentiment Analysis},
author={Jim Beno},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/jbeno/electra-base-classifier-sentiment}},
}
Contact
For questions or comments, please open an issue on the repository or contact Jim Beno.
Acknowledgments
- The Hugging Face Transformers library for providing powerful tools for model development.
- The creators of the ELECTRA model for their foundational work.
- The authors of the datasets used: Stanford Sentiment Treebank, DynaSent.
- Stanford Engineering CGOE, Chris Potts, and the Course Facilitators of XCS224U