Italian_NER_XXL_v2

🚀 Model Overview

Welcome to the second generation of our state-of-the-art Named Entity Recognition model for Italian text. Building on the success of our previous version, Italian_NER_XXL_v2 delivers significantly enhanced performance with an accuracy of 87.5% and F1 score of 89.2% - an improvement of over 8 percentage points from my previous model.

💡 Key Improvements

Enhanced Accuracy: From 79% to 87.5%
Better Context Understanding: Improved recognition of entities in complex sentences
Reduced False Positives: More precise identification of sensitive information
Expanded Training Data: Trained on a more diverse corpus of Italian text

🏆 Market Leadership

Italian_NER_XXL_v2 remains the only model in Italy capable of identifying a comprehensive range of 52 different entity categories, maintaining our unique position in the Italian NLP landscape. This unparalleled breadth of entity recognition makes our model the premier choice for privacy, legal, and financial applications.

📋 Recognized Categories

Our model identifies an extensive range of entities across multiple domains:

Personal Information

NOME: First name of a person
COGNOME: Last name of a person
DATA_NASCITA: Date of birth
DATA_MORTE: Date of death
ETA: Age of a person
CODICE_FISCALE: Italian tax code
PROFESSIONE: Occupation or profession
STATO_CIVILE: Civil status

Contact Information

INDIRIZZO: Physical address
NUMERO_TELEFONO: Phone number
EMAIL: Email address
CODICE_POSTALE: Postal code

Financial Information

VALUTA: Currency
IMPORTO: Monetary amount
NUMERO_CARTA: Credit/debit card number
CVV: Card security code
NUMERO_CONTO: Bank account number
IBAN: International bank account number
BIC: Bank identifier code
P_IVA: VAT number
TASSO_MUTUO: Mortgage rate
NUM_ASSEGNO_BANCARIO: Bank check number
BANCA: Bank name

Legal Entities

RAGIONE_SOCIALE: Company legal name
TRIBUNALE: Court identifier
LEGGE: Law reference
N_SENTENZA: Sentence number
N_LICENZA: License number
AVV_NOTAIO: Lawyer or notary reference
REGIME_PATRIMONIALE: Property regime

Medical Information

CARTELLA_CLINICA: Medical record
MALATTIA: Disease or medical condition
MEDICINA: Medicine or medical treatment
STORIA_CLINICA: Clinical history
STRENGTH: Medicine strength
FREQUENZA: Treatment frequency
DURATION: Duration of treatment
DOSAGGIO: Medicine dosage
FORM: Medicine form (e.g., tablet)

Technical Information

IP: IP address
IPV6_1: IPv6 address
MAC: MAC address
USER_AGENT: Browser user agent
IMEI: Mobile device identifier

Geographic and Temporal Data

STATO: Country or nation
LUOGO: Geographic location
ORARIO: Specific time
DATA: Generic date

Document and Vehicle Information

NUMERO_DOCUMENTO: Document number
TARGA_VEICOLO: Vehicle license plate
FOGLIO: Document sheet reference
PARTICELLA: Land registry particle
MAPPALE: Land registry map reference
SUBALTERNO: Land registry subordinate reference

Web and Security

URL: Web address
PASSWORD: Password
PIN: Personal identification number
BRAND: Commercial brand or trademark

💻 Implementation

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
model = AutoModelForTokenClassification.from_pretrained("DeepMount00/Italian_NER_XXL_v2")

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Example text
example = """Il commendatore Gianluigi Alberico De Laurentis-Ponti, con residenza legale in Corso Imperatrice 67, 
Torino, avente codice fiscale DLNGGL60B01L219P, è amministratore delegato della "De Laurentis Advanced Engineering 
Group S.p.A.", che si trova in Piazza Affari 32, Milano (MI); con una partita IVA di 09876543210, la società è stata 
recentemente incaricata di sviluppare una nuova linea di componenti aerospaziali per il progetto internazionale 
di esplorazione di Marte."""

# Run NER
ner_results = nlp(example)

# Process results
for entity in ner_results:
    print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")

🚀 Use Cases

Privacy Compliance: GDPR data mapping and PII detection
Document Anonymization: Automated redaction of sensitive information
Legal Document Analysis: Extraction of key entities from contracts and legal texts
Financial Monitoring: Detection of financial entities for compliance and fraud prevention
Medical Record Processing: Structured extraction from clinical notes and reports

🔮 Future Development

We're committed to continuous improvement of the model:

Quarterly updates with further accuracy enhancements
Expansion to include new entity types based on user feedback
Development of domain-specific variants for specialized applications
Integration of contextual entity linking capabilities

👥 Contribution and Contact

Your feedback is essential to improving this model. If you're interested in contributing, have suggestions, or need a customized NER solution, please contact:

Michele Montebovi
Email: [email protected]

We welcome collaboration from the Italian NLP community to further enhance this tool and expand its applications across industries.

📝 Citation

If you use this model in your research or applications, please cite:

@misc{montebovi2025italiannerxxl,
  author = {Montebovi, Michele},
  title = {Italian\_NER\_XXL\_v2: A Comprehensive Named Entity Recognition Model for Italian},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/DeepMount00/Italian_NER_XXL_v2}}
}

DeepMount00
/

Italian_NER_XXL_v2