GATE-AraBert-v0

This is a General Arabic Text Embedding trained using SentenceTransformers in a multi-task setup. The system trains on the AllNLI and on the STS dataset.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Datasets:
- all-nli
- sts
Language: ar

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/GATE-AraBert-v0")
# Run inference
sentences = [
    'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.',
    'لقد مات الكلب',
    'شخص طويل القامة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8384
spearman_cosine	0.8389
pearson_manhattan	0.8248
spearman_manhattan	0.8329
pearson_euclidean	0.825
spearman_euclidean	0.8337
pearson_dot	0.8072
spearman_dot	0.8098
pearson_max	0.8384
spearman_max	0.8389

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7908
spearman_cosine	0.7893
pearson_manhattan	0.7923
spearman_manhattan	0.7947
pearson_euclidean	0.7904
spearman_euclidean	0.7934
pearson_dot	0.7404
spearman_dot	0.7354
pearson_max	0.7923
spearman_max	0.7947

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Omartificial-Intelligence-Space/GATE-AraBert-v0

Base model

aubmindlab/bert-base-arabertv02

Finetuned

Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka

Finetuned

(1)

this model

Dataset used to train Omartificial-Intelligence-Space/GATE-AraBert-v0

Evaluation results

ndcg_at_1 on MTEB MIRACLRetrieval (ar)
self-reported

6.181
ndcg_at_3 on MTEB MIRACLRetrieval (ar)
self-reported

7.546
ndcg_at_5 on MTEB MIRACLRetrieval (ar)
self-reported

8.949
ndcg_at_10 on MTEB MIRACLRetrieval (ar)
self-reported

11.355
ndcg_at_20 on MTEB MIRACLRetrieval (ar)
self-reported

13.562
ndcg_at_100 on MTEB MIRACLRetrieval (ar)
self-reported

17.749
ndcg_at_1000 on MTEB MIRACLRetrieval (ar)
self-reported

21.716
map_at_1 on MTEB MIRACLRetrieval (ar)
self-reported

4.181
map_at_3 on MTEB MIRACLRetrieval (ar)
self-reported

6.099
map_at_5 on MTEB MIRACLRetrieval (ar)
self-reported

6.945

View on Papers With Code