SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 512 tokens
  • Number of Classes: 3 classes

Model Sources

Model Labels

Label Examples
Enrichment / reinterpretation
  • 'the statement recognised the objective compassion but the opinion contradicted it'
  • "the person's individual belief doesn't tally with the accepted belief; this is perfectly reasonable."
  • 'cyberbully may seem cruel to everyone, but to tom, he does not feel cruel to him.'
Linguistic (in)felicity
  • 'because if its wrong how can you then make a statement saying it is not wrong'
  • 'it is contradictory.'
  • 'because the writer just stated that it s raining so how could she then not know if it is raining?'
Lack of understanding / clear misunderstanding
  • 'it sounds very contradictory'
  • 'it reads well and makes sense'
  • 'it make not sense on one hand help the homeless people is right, on the hand hand it is not unethical.'

Evaluation

Metrics

Label Accuracy Precision Recall F1
all 0.9211 0.9199 0.9031 0.9106

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the ๐Ÿค— Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("it made sense because it is tom's opinion that cyberbullying is not wrong.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 2 16.375 92
Label Training Sample Count
Enrichment / reinterpretation 29
Lack of understanding / clear misunderstanding 11
Linguistic (in)felicity 112

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 376
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0026 1 0.2512 -
0.1316 50 0.2213 -
0.2632 100 0.1707 -
0.3947 150 0.0839 -
0.5263 200 0.0335 -
0.6579 250 0.0141 -
0.7895 300 0.0072 -
0.9211 350 0.0026 -
1.0526 400 0.0008 -
1.1842 450 0.0006 -
1.3158 500 0.0004 -
1.4474 550 0.0002 -
1.5789 600 0.0002 -
1.7105 650 0.0002 -
1.8421 700 0.0002 -
1.9737 750 0.0002 -
2.1053 800 0.0002 -
2.2368 850 0.0002 -
2.3684 900 0.0001 -
2.5 950 0.0001 -
2.6316 1000 0.0001 -
2.7632 1050 0.0001 -
2.8947 1100 0.0001 -
3.0263 1150 0.0001 -
3.1579 1200 0.0001 -
3.2895 1250 0.0001 -
3.4211 1300 0.0001 -
3.5526 1350 0.0001 -
3.6842 1400 0.0001 -
3.8158 1450 0.0001 -
3.9474 1500 0.0001 -
4.0789 1550 0.0002 -
4.2105 1600 0.0001 -
4.3421 1650 0.0033 -
4.4737 1700 0.0001 -
4.6053 1750 0.0004 -
4.7368 1800 0.0035 -
4.8684 1850 0.0002 -
5.0 1900 0.0003 -
5.1316 1950 0.0001 -
5.2632 2000 0.0001 -
5.3947 2050 0.0001 -
5.5263 2100 0.0001 -
5.6579 2150 0.0001 -
5.7895 2200 0.0001 -
5.9211 2250 0.0001 -
6.0526 2300 0.0001 -
6.1842 2350 0.0001 -
6.3158 2400 0.0001 -
6.4474 2450 0.0001 -
6.5789 2500 0.0001 -
6.7105 2550 0.0001 -
6.8421 2600 0.0001 -
6.9737 2650 0.0001 -
7.1053 2700 0.0001 -
7.2368 2750 0.0001 -
7.3684 2800 0.0001 -
7.5 2850 0.0 -
7.6316 2900 0.0001 -
7.7632 2950 0.0001 -
7.8947 3000 0.0001 -
8.0263 3050 0.0001 -
8.1579 3100 0.0001 -
8.2895 3150 0.0001 -
8.4211 3200 0.0001 -
8.5526 3250 0.0001 -
8.6842 3300 0.0001 -
8.8158 3350 0.0001 -
8.9474 3400 0.0001 -
9.0789 3450 0.0001 -
9.2105 3500 0.0001 -
9.3421 3550 0.0 -
9.4737 3600 0.0 -
9.6053 3650 0.0001 -
9.7368 3700 0.0001 -
9.8684 3750 0.0 -
10.0 3800 0.0 -

Framework Versions

  • Python: 3.11.9
  • SetFit: 1.1.2
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.7.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
66
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results