GLiNER Arabic Model (v2.1)
gliner_arabic-v2.1
is a specialized Named Entity Recognition (NER) model designed for processing Arabic text with high accuracy and robustness. Built on top of the urchade/gliner_large-v2
base model, this version has been fine-tuned to excel in identifying a wide range of entities in Arabic, making it suitable for applications requiring rich entity extraction in Arabic-language datasets. The model also supports English to a limited extent, enabling cross-lingual use cases.
This model is part of the GLiNER family, leveraging the Generalized Language-augmented Multimodal Entity Recognition framework to provide state-of-the-art performance in token classification tasks.
Key Features
- Rich Entity Recognition: Detects a diverse set of entities tailored for Arabic text, including but not limited to persons, organizations, locations, dates, and more.
- Bilingual Support: Primarily optimized for Arabic (
ar
) with auxiliary support for English (en
). - High Performance: Fine-tuned for robustness and accuracy in real-world Arabic NLP applications.
- Apache-2.0 License: Freely available for commercial and non-commercial use.
Model details:
Model Name: NAMAA-Space/gliner_arabic-v2.1
License: Apache-2.0
Languages: Arabic (ar
), English (en
)
Base Model: urchade/gliner_multi-v2.1
Pipeline Tag: Token Classification
Tags: GLiNER, Arabic, NER
Applications
The gliner_arabic-v2.1
model is ideal for:
- Extracting entities from Arabic news articles, social media, and legal documents.
- Building knowledge graphs for Arabic content.
- Enhancing search and recommendation systems with entity-aware features.
- Supporting cross-lingual applications with mixed Arabic and English text.
Installation
To use the gliner_arabic-v2.1
model, you need to have the gliner
library installed. You can install it via pip:
pip install gliner
Ensure you have the necessary dependencies compatible with the urchAde/gliner_large-v2
base model.
Usage
Below is an example of how to load and use the model for NER tasks in Python:
from gliner import GLiNER
# Load the model
model = GLiNER.from_pretrained("NAMAA-Space/gliner_arabic-v2.1")
# Example text (Arabic)
text = "ุบุฒุฉุ ู
ุฏููุฉ ูุตู
ุฏ ุดุนุจูุง ุงูููุณุทููู ุงูู
ุญุงุตุฑ ุจูููุจ ูุงูุตุฎุฑุ ููุงุฌููู ุงูุฅุจุงุฏุฉ ุงูุฌู
ุงุนูุฉ ู
ู ุงูููุงู ุงูุตููููู ุจุฑุนุงูุฉ ุฃู
ุฑูููุฉ ูุฎุฐูุงู ุงูุนุงูู
ุฃุฌู
ุนุ ุญูุซ ููุงูู
ุฃูููุงุ ุจููุงุฏุฉ ูุญูู ุงูุณููุงุฑ ูู
ุญู
ุฏ ุงูุถููุ ู
ุน ูุตุงุฆู ุญู
ุงุณ ุชุญุช ุงููุตู ูุงูุญุตุงุฑ ูุงูู
ูุช ู
ูุฐ 7 ุฃูุชูุจุฑ 2023ุ ูุณุท ุตู
ุช ุงูุฃู
ู
ุงูู
ุชุญุฏุฉ ูุงูุงุชุญุงุฏ ุงูุฃูุฑูุจูุ ุจููู
ุง ุชุฌุฑู ู
ูุงูุถุงุช ูู ุงููุงูุฑุฉ ุจูุณุงุทุฉ ู
ุตุฑ ููุทุฑ."
labels = ["ุดุฎุต", "ู
ูุธู
ุฉ", "ุชุงุฑูุฎ", "ู
ููุน"]
# Perform entity prediction
entities = model.predict_entities(text, labels, threshold=0.5)
# Display predicted entities and their labels
for entity in entities:
print(f"Entity: {entity['text']} | Label: {entity['label']} | Score: {entity['score']:.3f}")
Example Output
Entity: ุบุฒุฉ | Label: ู
ููุน | Score: 0.797
Entity: ุงูููุงู ุงูุตููููู | Label: ู
ูุธู
ุฉ | Score: 0.783
Entity: ูุญูู ุงูุณููุงุฑ | Label: ุดุฎุต | Score: 0.917
Entity: ูุตุงุฆู ุญู
ุงุณ | Label: ู
ูุธู
ุฉ | Score: 0.551
Entity: ุญู
ุงุณ | Label: ู
ูุธู
ุฉ | Score: 0.588
Entity: 7 ุฃูุชูุจุฑ 2023 | Label: ุชุงุฑูุฎ | Score: 0.837
Entity: ุงูุฃู
ู
ุงูู
ุชุญุฏุฉ | Label: ู
ูุธู
ุฉ | Score: 0.823
Entity: ุงููุงูุฑุฉ | Label: ู
ููุน | Score: 0.773
Entity: ู
ุตุฑ | Label: ู
ููุน | Score: 0.588
Limitations
- Primary Focus on Arabic: While the model supports English, its performance is optimized for Arabic text. English entity recognition may not match native English models.
- Context Sensitivity: Performance may vary depending on the complexity of the text and the presence of ambiguous entities.
- Label Dependency: The model requires predefined entity labels for prediction, which may limit its flexibility in open-domain settings.
Contact
For questions, issues, or contributions, please reach out via the Hugging Face model page or open an issue on the repository.
Acknowledgments
This model builds upon the foundational work of the urchade/gliner_large-v2
model and the GLiNER framework. We thank the open-source community for their contributions to Arabic NLP
- Downloads last month
- 9
Model tree for NAMAA-Space/gliner_arabic-v2.1
Base model
urchade/gliner_multi-v2.1