Italian_NER_XXL_v2

๐Ÿš€ Model Overview

Welcome to the second generation of our state-of-the-art Named Entity Recognition model for Italian text. Building on the success of our previous version, Italian_NER_XXL_v2 delivers significantly enhanced performance with an accuracy of 87.5% and F1 score of 89.2% - an improvement of over 8 percentage points from my previous model.

๐Ÿ’ก Key Improvements

  • Enhanced Accuracy: From 79% to 87.5%
  • Better Context Understanding: Improved recognition of entities in complex sentences
  • Reduced False Positives: More precise identification of sensitive information
  • Expanded Training Data: Trained on a more diverse corpus of Italian text

๐Ÿ† Market Leadership

Italian_NER_XXL_v2 remains the only model in Italy capable of identifying a comprehensive range of 52 different entity categories, maintaining our unique position in the Italian NLP landscape. This unparalleled breadth of entity recognition makes our model the premier choice for privacy, legal, and financial applications.

๐Ÿ“‹ Recognized Categories

Our model identifies an extensive range of entities across multiple domains:

Personal Information

  • NOME: First name of a person
  • COGNOME: Last name of a person
  • DATA_NASCITA: Date of birth
  • DATA_MORTE: Date of death
  • ETA: Age of a person
  • CODICE_FISCALE: Italian tax code
  • PROFESSIONE: Occupation or profession
  • STATO_CIVILE: Civil status

Contact Information

  • INDIRIZZO: Physical address
  • NUMERO_TELEFONO: Phone number
  • EMAIL: Email address
  • CODICE_POSTALE: Postal code

Financial Information

  • VALUTA: Currency
  • IMPORTO: Monetary amount
  • NUMERO_CARTA: Credit/debit card number
  • CVV: Card security code
  • NUMERO_CONTO: Bank account number
  • IBAN: International bank account number
  • BIC: Bank identifier code
  • P_IVA: VAT number
  • TASSO_MUTUO: Mortgage rate
  • NUM_ASSEGNO_BANCARIO: Bank check number
  • BANCA: Bank name

Legal Entities

  • RAGIONE_SOCIALE: Company legal name
  • TRIBUNALE: Court identifier
  • LEGGE: Law reference
  • N_SENTENZA: Sentence number
  • N_LICENZA: License number
  • AVV_NOTAIO: Lawyer or notary reference
  • REGIME_PATRIMONIALE: Property regime

Medical Information

  • CARTELLA_CLINICA: Medical record
  • MALATTIA: Disease or medical condition
  • MEDICINA: Medicine or medical treatment
  • STORIA_CLINICA: Clinical history
  • STRENGTH: Medicine strength
  • FREQUENZA: Treatment frequency
  • DURATION: Duration of treatment
  • DOSAGGIO: Medicine dosage
  • FORM: Medicine form (e.g., tablet)

Technical Information

  • IP: IP address
  • IPV6_1: IPv6 address
  • MAC: MAC address
  • USER_AGENT: Browser user agent
  • IMEI: Mobile device identifier

Geographic and Temporal Data

  • STATO: Country or nation
  • LUOGO: Geographic location
  • ORARIO: Specific time
  • DATA: Generic date

Document and Vehicle Information

  • NUMERO_DOCUMENTO: Document number
  • TARGA_VEICOLO: Vehicle license plate
  • FOGLIO: Document sheet reference
  • PARTICELLA: Land registry particle
  • MAPPALE: Land registry map reference
  • SUBALTERNO: Land registry subordinate reference

Web and Security

  • URL: Web address
  • PASSWORD: Password
  • PIN: Personal identification number
  • BRAND: Commercial brand or trademark

๐Ÿ’ป Implementation

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
model = AutoModelForTokenClassification.from_pretrained("DeepMount00/Italian_NER_XXL_v2")

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Example text
example = """Il commendatore Gianluigi Alberico De Laurentis-Ponti, con residenza legale in Corso Imperatrice 67, 
Torino, avente codice fiscale DLNGGL60B01L219P, รจ amministratore delegato della "De Laurentis Advanced Engineering 
Group S.p.A.", che si trova in Piazza Affari 32, Milano (MI); con una partita IVA di 09876543210, la societร  รจ stata 
recentemente incaricata di sviluppare una nuova linea di componenti aerospaziali per il progetto internazionale 
di esplorazione di Marte."""

# Run NER
ner_results = nlp(example)

# Process results
for entity in ner_results:
    print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")

๐Ÿš€ Use Cases

  • Privacy Compliance: GDPR data mapping and PII detection
  • Document Anonymization: Automated redaction of sensitive information
  • Legal Document Analysis: Extraction of key entities from contracts and legal texts
  • Financial Monitoring: Detection of financial entities for compliance and fraud prevention
  • Medical Record Processing: Structured extraction from clinical notes and reports

๐Ÿ”ฎ Future Development

We're committed to continuous improvement of the model:

  • Quarterly updates with further accuracy enhancements
  • Expansion to include new entity types based on user feedback
  • Development of domain-specific variants for specialized applications
  • Integration of contextual entity linking capabilities

๐Ÿ‘ฅ Contribution and Contact

Your feedback is essential to improving this model. If you're interested in contributing, have suggestions, or need a customized NER solution, please contact:

Michele Montebovi
Email: [email protected]

We welcome collaboration from the Italian NLP community to further enhance this tool and expand its applications across industries.

๐Ÿ“ Citation

If you use this model in your research or applications, please cite:

@misc{montebovi2025italiannerxxl,
  author = {Montebovi, Michele},
  title = {Italian\_NER\_XXL\_v2: A Comprehensive Named Entity Recognition Model for Italian},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/DeepMount00/Italian_NER_XXL_v2}}
}
Downloads last month
4
Safetensors
Model size
110M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using DeepMount00/Italian_NER_XXL_v2 1