Edit model card

Модель BERT для расчетов эмбеддингов предложений на русском языке. Модель основана на cointegrated/LaBSE-en-ru - имеет аналогичные размеры контекста (512), ембеддинга (768) и быстродействие.

Использование:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('sergeyzh/LaBSE-ru-turbo')

sentences = ["привет мир", "hello world", "здравствуй вселенная"]
embeddings = model.encode(sentences)
print(util.dot_score(embeddings, embeddings))

Метрики

Оценки модели на бенчмарке encodechka:

Model CPU GPU size Mean S Mean S+W dim
sergeyzh/LaBSE-ru-turbo 120.40 8.05 490 0.789 0.702 768
BAAI/bge-m3 523.40 22.50 2166 0.787 0.696 1024
intfloat/multilingual-e5-large 506.80 30.80 2136 0.780 0.686 1024
intfloat/multilingual-e5-base 130.61 14.39 1061 0.761 0.669 768
sergeyzh/rubert-tiny-turbo 5.51 3.25 111 0.749 0.667 312
intfloat/multilingual-e5-small 40.86 12.09 449 0.742 0.645 384
cointegrated/LaBSE-en-ru 120.40 8.05 490 0.739 0.667 768
Model STS PI NLI SA TI IA IC ICX NE1 NE2
sergeyzh/LaBSE-ru-turbo 0.864 0.748 0.490 0.814 0.974 0.806 0.815 0.801 0.305 0.404
BAAI/bge-m3 0.864 0.749 0.510 0.819 0.973 0.792 0.809 0.783 0.240 0.422
intfloat/multilingual-e5-large 0.862 0.727 0.473 0.810 0.979 0.798 0.819 0.773 0.224 0.374
intfloat/multilingual-e5-base 0.835 0.704 0.459 0.796 0.964 0.783 0.802 0.738 0.235 0.376
sergeyzh/rubert-tiny-turbo 0.828 0.722 0.476 0.787 0.955 0.757 0.780 0.685 0.305 0.373
intfloat/multilingual-e5-small 0.822 0.714 0.457 0.758 0.957 0.761 0.779 0.691 0.234 0.275
cointegrated/LaBSE-en-ru 0.794 0.659 0.431 0.761 0.946 0.766 0.789 0.769 0.340 0.414

Оценки модели на бенчмарке ruMTEB:

Model Name Metric sbert_large_ mt_nlu_ru sbert_large_ nlu_ru LaBSE-ru-sts LaBSE-ru-turbo multilingual-e5-small multilingual-e5-base multilingual-e5-large
CEDRClassification Accuracy 0.368 0.358 0.418 0.451 0.401 0.423 0.448
GeoreviewClassification Accuracy 0.397 0.400 0.406 0.438 0.447 0.461 0.497
GeoreviewClusteringP2P V-measure 0.584 0.590 0.626 0.644 0.586 0.545 0.605
HeadlineClassification Accuracy 0.772 0.793 0.633 0.688 0.732 0.757 0.758
InappropriatenessClassification Accuracy 0.646 0.625 0.599 0.615 0.592 0.588 0.616
KinopoiskClassification Accuracy 0.503 0.495 0.496 0.521 0.500 0.509 0.566
RiaNewsRetrieval NDCG@10 0.214 0.111 0.651 0.694 0.700 0.702 0.807
RuBQReranking MAP@10 0.561 0.468 0.688 0.687 0.715 0.720 0.756
RuBQRetrieval NDCG@10 0.298 0.124 0.622 0.657 0.685 0.696 0.741
RuReviewsClassification Accuracy 0.589 0.583 0.599 0.632 0.612 0.630 0.653
RuSTSBenchmarkSTS Pearson correlation 0.712 0.588 0.788 0.822 0.781 0.796 0.831
RuSciBenchGRNTIClassification Accuracy 0.542 0.539 0.529 0.569 0.550 0.563 0.582
RuSciBenchGRNTIClusteringP2P V-measure 0.522 0.504 0.486 0.517 0.511 0.516 0.520
RuSciBenchOECDClassification Accuracy 0.438 0.430 0.406 0.440 0.427 0.423 0.445
RuSciBenchOECDClusteringP2P V-measure 0.473 0.464 0.426 0.452 0.443 0.448 0.450
SensitiveTopicsClassification Accuracy 0.285 0.280 0.262 0.272 0.228 0.234 0.257
TERRaClassification Average Precision 0.520 0.502 0.587 0.585 0.551 0.550 0.584
Model Name Metric sbert_large_ mt_nlu_ru sbert_large_ nlu_ru LaBSE-ru-sts LaBSE-ru-turbo multilingual-e5-small multilingual-e5-base multilingual-e5-large
Classification Accuracy 0.554 0.552 0.524 0.558 0.551 0.561 0.588
Clustering V-measure 0.526 0.519 0.513 0.538 0.513 0.503 0.525
MultiLabelClassification Accuracy 0.326 0.319 0.340 0.361 0.314 0.329 0.353
PairClassification Average Precision 0.520 0.502 0.587 0.585 0.551 0.550 0.584
Reranking MAP@10 0.561 0.468 0.688 0.687 0.715 0.720 0.756
Retrieval NDCG@10 0.256 0.118 0.637 0.675 0.697 0.699 0.774
STS Pearson correlation 0.712 0.588 0.788 0.822 0.781 0.796 0.831
Average Average 0.494 0.438 0.582 0.604 0.588 0.594 0.630
Downloads last month
814
Safetensors
Model size
128M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for sergeyzh/LaBSE-ru-turbo

Finetuned
this model

Datasets used to train sergeyzh/LaBSE-ru-turbo