xiaobu-embedding
模型:基于GTE模型[1]多任务微调。
数据:闲聊类Query-Query、知识类Query-Doc、BGE开源Query-Doc[2];清洗正例,挖掘中等难度负例;累计6M(质量更重要)。
Usage (Sentence-Transformers)
pip install -U sentence-transformers
相似度计算:
from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Evaluation
参考BGE中文CMTEB评估[2]
Finetune
参考BGE微调模块[2]
Reference
- Downloads last month
- 450
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Spaces using lier007/xiaobu-embedding 2
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported49.379
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported54.847
- euclidean_pearson on MTEB AFQMCvalidation set self-reported53.050
- euclidean_spearman on MTEB AFQMCvalidation set self-reported54.848
- manhattan_pearson on MTEB AFQMCvalidation set self-reported53.063
- manhattan_spearman on MTEB AFQMCvalidation set self-reported54.874
- cos_sim_pearson on MTEB ATECtest set self-reported48.160
- cos_sim_spearman on MTEB ATECtest set self-reported55.132
- euclidean_pearson on MTEB ATECtest set self-reported55.436
- euclidean_spearman on MTEB ATECtest set self-reported55.132