Tooka
Collection
This collection hosts the transformers and original repos of the Tooka releases.
•
5 items
•
Updated
•
2
This model is a Sentence Transformers model trained for semantic textual similarity and embedding tasks. It maps sentences and paragraphs to a dense vector space, where semantically similar texts are close together.
The model is trained in two sizes: Small and Large
First install the Sentence Transformers library:
pip install sentence-transformers==3.4.1
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("PartAI/Tooka-SBERT-V2-Small")
# Run inference
sentences = [
'درنا از پرندگان مهاجر با پاهای بلند و گردن دراز است.',
'درناها با قامتی بلند و بالهای پهن، از زیباترین پرندگان مهاجر به شمار میروند.',
'درناها پرندگانی کوچک با پاهای کوتاه هستند که مهاجرت نمیکنند.'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
The training is performed in two stages:
"سوال: "
"متن: "
CachedMultipleNegativesRankingLoss
CachedMultipleNegativesRankingLoss
CoSENTLoss
We evaluate our model on the PTEB Benchmark. Our model outperforms mE5-Base on average across PTEB tasks.
For Retrieval and Reranking tasks, we follow the same asymmetric structure, prepending:
"سوال: "
to queries "متن: "
to documentsModel | #Params | Pair-Classification-Avg | Classification-Avg | Retrieval-Avg | Reranking-Avg | CrossTasks-Avg |
---|---|---|---|---|---|---|
Tooka-SBERT-V2-Large | 353M | 80.24 | 74.73 | 59.80 | 73.44 | 72.05 |
Tooka-SBERT-V2-Small | 123M | 75.69 | 72.16 | 61.24 | 73.40 | 70.62 |
jina-embeddings-v3 | 572M | 71.88 | 79.27 | 65.18 | 64.62 | 70.24 |
multilingual-e5-base | 278M | 70.76 | 69.71 | 63.90 | 76.01 | 70.09 |
Tooka-SBERT-V1-Large | 353M | 81.52 | 71.54 | 45.61 | 60.44 | 64.78 |
Pair-Classification:
Classification:
Retrieval:
Reranking:
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Base model
PartAI/TookaBERT-Base