MPexplanations / README.md
lucienbaumgartner's picture
Add SetFit model
779bffe verified
metadata
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      it does not make sense because sally believe its makes sense and at the
      same time does not make  sense to help the homeless.
  - text: >-
      it contradicts itself- how can something be right and you then think it's
      not right?
  - text: it made sense because it is tom's opinion that cyberbullying is not wrong.
  - text: >-
      a person can think it is raining even when it is. there is nothing wrong
      with thinking that way. the thought makes sense even though the fact is
      incorrect.
  - text: >-
      they contradict their own opinions on the morals. although i can
      understand how they came to that conclusion. perhaps they mean, helping
      the homeless is morally right, however it's not right for my situation.
      context and clarification is key here.
metrics:
  - accuracy
  - precision
  - recall
  - f1
pipeline_tag: text-classification
library_name: setfit
inference: true
model-index:
  - name: SetFit
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.9210526315789473
            name: Accuracy
          - type: precision
            value: 0.9198717948717949
            name: Precision
          - type: recall
            value: 0.9030769230769231
            name: Recall
          - type: f1
            value: 0.9105882352941177
            name: F1

SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 512 tokens
  • Number of Classes: 3 classes

Model Sources

Model Labels

Label Examples
Enrichment / reinterpretation
  • 'the statement recognised the objective compassion but the opinion contradicted it'
  • "the person's individual belief doesn't tally with the accepted belief; this is perfectly reasonable."
  • 'cyberbully may seem cruel to everyone, but to tom, he does not feel cruel to him.'
Linguistic (in)felicity
  • 'because if its wrong how can you then make a statement saying it is not wrong'
  • 'it is contradictory.'
  • 'because the writer just stated that it s raining so how could she then not know if it is raining?'
Lack of understanding / clear misunderstanding
  • 'it sounds very contradictory'
  • 'it reads well and makes sense'
  • 'it make not sense on one hand help the homeless people is right, on the hand hand it is not unethical.'

Evaluation

Metrics

Label Accuracy Precision Recall F1
all 0.9211 0.9199 0.9031 0.9106

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("it made sense because it is tom's opinion that cyberbullying is not wrong.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 2 16.375 92
Label Training Sample Count
Enrichment / reinterpretation 29
Lack of understanding / clear misunderstanding 11
Linguistic (in)felicity 112

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 376
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0026 1 0.2512 -
0.1316 50 0.2213 -
0.2632 100 0.1707 -
0.3947 150 0.0839 -
0.5263 200 0.0335 -
0.6579 250 0.0141 -
0.7895 300 0.0072 -
0.9211 350 0.0026 -
1.0526 400 0.0008 -
1.1842 450 0.0006 -
1.3158 500 0.0004 -
1.4474 550 0.0002 -
1.5789 600 0.0002 -
1.7105 650 0.0002 -
1.8421 700 0.0002 -
1.9737 750 0.0002 -
2.1053 800 0.0002 -
2.2368 850 0.0002 -
2.3684 900 0.0001 -
2.5 950 0.0001 -
2.6316 1000 0.0001 -
2.7632 1050 0.0001 -
2.8947 1100 0.0001 -
3.0263 1150 0.0001 -
3.1579 1200 0.0001 -
3.2895 1250 0.0001 -
3.4211 1300 0.0001 -
3.5526 1350 0.0001 -
3.6842 1400 0.0001 -
3.8158 1450 0.0001 -
3.9474 1500 0.0001 -
4.0789 1550 0.0002 -
4.2105 1600 0.0001 -
4.3421 1650 0.0033 -
4.4737 1700 0.0001 -
4.6053 1750 0.0004 -
4.7368 1800 0.0035 -
4.8684 1850 0.0002 -
5.0 1900 0.0003 -
5.1316 1950 0.0001 -
5.2632 2000 0.0001 -
5.3947 2050 0.0001 -
5.5263 2100 0.0001 -
5.6579 2150 0.0001 -
5.7895 2200 0.0001 -
5.9211 2250 0.0001 -
6.0526 2300 0.0001 -
6.1842 2350 0.0001 -
6.3158 2400 0.0001 -
6.4474 2450 0.0001 -
6.5789 2500 0.0001 -
6.7105 2550 0.0001 -
6.8421 2600 0.0001 -
6.9737 2650 0.0001 -
7.1053 2700 0.0001 -
7.2368 2750 0.0001 -
7.3684 2800 0.0001 -
7.5 2850 0.0 -
7.6316 2900 0.0001 -
7.7632 2950 0.0001 -
7.8947 3000 0.0001 -
8.0263 3050 0.0001 -
8.1579 3100 0.0001 -
8.2895 3150 0.0001 -
8.4211 3200 0.0001 -
8.5526 3250 0.0001 -
8.6842 3300 0.0001 -
8.8158 3350 0.0001 -
8.9474 3400 0.0001 -
9.0789 3450 0.0001 -
9.2105 3500 0.0001 -
9.3421 3550 0.0 -
9.4737 3600 0.0 -
9.6053 3650 0.0001 -
9.7368 3700 0.0001 -
9.8684 3750 0.0 -
10.0 3800 0.0 -

Framework Versions

  • Python: 3.11.9
  • SetFit: 1.1.2
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.7.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}