image/png

Model Summary:

This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

Dataset:

  • The dataset used for training is derived from Egyptian law books.
  • It consists of synthetic data generated using a Large Language Model (LLM).
  • The dataset contains 20,252 samples, formatted as question-answer pairs.

Key Features:

  • Vector Representation: 768-dimensional embeddings.
  • Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
  • Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

🏆 Leaderboard Performance

The Muffakir_Embedding model has achieved notable rankings on the Arabic RAG Leaderboard, securing:

🥇 1th place in the Islamic Dataset

These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.


This model is optimized for legal document retrieval and other NLP applications in Arabic.

Downloads last month
156
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mohamed2811/Muffakir_Embedding