sgpt-bloom-1b7-nli
Usage
For usage instructions, refer to: https://github.com/Muennighoff/sgpt#symmetric-semantic-search
The model was trained with the command
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/nli/training_nli_v2.py --model_name bigscience/bloom-1b3 --freezenonbias --train_batch_size 128 --lr 32e-5 --pooling weightedmean --wandb --wandbwatchlog gradients --gradcache --chunksize 4
Evaluation Results
{'askubuntu': 57.44, 'cqadupstack': 14.18, 'twitterpara': 73.99, 'scidocs': 74.74, 'avg': 55.087500000000006}
Training
The model was trained with the parameters:
DataLoader:
sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader
of length 4403 with parameters:
{'batch_size': 128}
The model uses BitFit, weighted-mean pooling & GradCache, for details see: https://arxiv.org/abs/2202.08904
Loss:
sentence_transformers.losses.MultipleNegativesRankingLoss.MNRLGradCache
Parameters of the fit()-Method:
{
"epochs": 1,
"evaluation_steps": 440,
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
"max_grad_norm": 1,
"optimizer_class": "<class 'transformers.optimization.AdamW'>",
"optimizer_params": {
"lr": 0.00032
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 441,
"weight_decay": 0.01
}
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 75, 'do_lower_case': False}) with Transformer model: BloomModel
(1): Pooling({'word_embedding_dimension': 2048, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False})
)
Citing & Authors
@article{muennighoff2022sgpt,
title={SGPT: GPT Sentence Embeddings for Semantic Search},
author={Muennighoff, Niklas},
journal={arXiv preprint arXiv:2202.08904},
year={2022}
}
- Downloads last month
- 315
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Spaces using bigscience-data/sgpt-bloom-1b7-nli 3
Evaluation results
- accuracy on MTEB AmazonReviewsClassification (fr)test set self-reported39.286
- f1 on MTEB AmazonReviewsClassification (fr)test set self-reported38.871
- accuracy on MTEB AmazonReviewsClassification (zh)test set self-reported37.634
- f1 on MTEB AmazonReviewsClassification (zh)test set self-reported36.860
- accuracy on MTEB MTOPDomainClassification (fr)test set self-reported83.799
- f1 on MTEB MTOPDomainClassification (fr)test set self-reported83.723
- accuracy on MTEB MTOPIntentClassification (fr)test set self-reported63.360
- f1 on MTEB MTOPIntentClassification (fr)test set self-reported44.262
- accuracy on MTEB MassiveIntentClassification (fr)test set self-reported64.576
- f1 on MTEB MassiveIntentClassification (fr)test set self-reported62.605