mohamed2811/Muffakir_Embedding

Model Summary:

This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

Dataset:

The dataset used for training is derived from Egyptian law books.
It consists of synthetic data generated using a Large Language Model (LLM).
The dataset contains 20,252 samples, formatted as question-answer pairs.

Key Features:

Vector Representation: 768-dimensional embeddings.
Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

🏆 Leaderboard Performance

The Muffakir_Embedding model has achieved notable rankings on the Arabic RAG Leaderboard, securing:

🥇 1th place in the Islamic Dataset

These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.

This model is optimized for legal document retrieval and other NLP applications in Arabic.

mohamed2811
/

Muffakir_Embedding

Model Summary:

Dataset:

Key Features:

🏆 Leaderboard Performance

Model tree for mohamed2811/Muffakir_Embedding