GLiNER Arabic Model (v2.1)

gliner_arabic-v2.1 is a specialized Named Entity Recognition (NER) model designed for processing Arabic text with high accuracy and robustness. Built on top of the urchade/gliner_large-v2 base model, this version has been fine-tuned to excel in identifying a wide range of entities in Arabic, making it suitable for applications requiring rich entity extraction in Arabic-language datasets. The model also supports English to a limited extent, enabling cross-lingual use cases.

This model is part of the GLiNER family, leveraging the Generalized Language-augmented Multimodal Entity Recognition framework to provide state-of-the-art performance in token classification tasks.

Key Features

Rich Entity Recognition: Detects a diverse set of entities tailored for Arabic text, including but not limited to persons, organizations, locations, dates, and more.
Bilingual Support: Primarily optimized for Arabic (ar) with auxiliary support for English (en).
High Performance: Fine-tuned for robustness and accuracy in real-world Arabic NLP applications.
Apache-2.0 License: Freely available for commercial and non-commercial use.

Model details:

Model Name: NAMAA-Space/gliner_arabic-v2.1
License: Apache-2.0
Languages: Arabic (ar), English (en)
Base Model: urchade/gliner_multi-v2.1
Pipeline Tag: Token Classification
Tags: GLiNER, Arabic, NER

Applications

The gliner_arabic-v2.1 model is ideal for:

Extracting entities from Arabic news articles, social media, and legal documents.
Building knowledge graphs for Arabic content.
Enhancing search and recommendation systems with entity-aware features.
Supporting cross-lingual applications with mixed Arabic and English text.

Installation

To use the gliner_arabic-v2.1 model, you need to have the gliner library installed. You can install it via pip:

pip install gliner

Ensure you have the necessary dependencies compatible with the urchAde/gliner_large-v2 base model.

Usage

Below is an example of how to load and use the model for NER tasks in Python:

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("NAMAA-Space/gliner_arabic-v2.1")

# Example text (Arabic)
text = "غزة، مدينة يصمد شعبها الفلسطيني المحاصر بقلوب كالصخر، يواجهون الإبادة الجماعية من الكيان الصهيوني برعاية أمريكية وخذلان العالم أجمع، حيث يقاوم أهلها، بقيادة يحيى السنوار ومحمد الضيف، مع فصائل حماس تحت القصف والحصار والموت منذ 7 أكتوبر 2023، وسط صمت الأمم المتحدة والاتحاد الأوروبي، بينما تجري مفاوضات في القاهرة بوساطة مصر وقطر."
labels = ["شخص", "منظمة", "تاريخ", "موقع"]

# Perform entity prediction
entities = model.predict_entities(text, labels, threshold=0.5)

# Display predicted entities and their labels
for entity in entities:
    print(f"Entity: {entity['text']} | Label: {entity['label']} | Score: {entity['score']:.3f}")

Example Output

Entity: غزة | Label: موقع | Score: 0.797
Entity: الكيان الصهيوني | Label: منظمة | Score: 0.783
Entity: يحيى السنوار | Label: شخص | Score: 0.917
Entity: فصائل حماس | Label: منظمة | Score: 0.551
Entity: حماس | Label: منظمة | Score: 0.588
Entity: 7 أكتوبر 2023 | Label: تاريخ | Score: 0.837
Entity: الأمم المتحدة | Label: منظمة | Score: 0.823
Entity: القاهرة | Label: موقع | Score: 0.773
Entity: مصر | Label: موقع | Score: 0.588

Limitations

Primary Focus on Arabic: While the model supports English, its performance is optimized for Arabic text. English entity recognition may not match native English models.
Context Sensitivity: Performance may vary depending on the complexity of the text and the presence of ambiguous entities.
Label Dependency: The model requires predefined entity labels for prediction, which may limit its flexibility in open-domain settings.

Contact

For questions, issues, or contributions, please reach out via the Hugging Face model page or open an issue on the repository.

Acknowledgments

This model builds upon the foundational work of the urchade/gliner_large-v2 model and the GLiNER framework. We thank the open-source community for their contributions to Arabic NLP

NAMAA-Space
/

gliner_arabic-v2.1