Italian_NER_XXL_v2
๐ Model Overview
Welcome to the second generation of our state-of-the-art Named Entity Recognition model for Italian text. Building on the success of our previous version, Italian_NER_XXL_v2 delivers significantly enhanced performance with an accuracy of 87.5% and F1 score of 89.2% - an improvement of over 8 percentage points from my previous model.
๐ก Key Improvements
- Enhanced Accuracy: From 79% to 87.5%
- Better Context Understanding: Improved recognition of entities in complex sentences
- Reduced False Positives: More precise identification of sensitive information
- Expanded Training Data: Trained on a more diverse corpus of Italian text
๐ Market Leadership
Italian_NER_XXL_v2 remains the only model in Italy capable of identifying a comprehensive range of 52 different entity categories, maintaining our unique position in the Italian NLP landscape. This unparalleled breadth of entity recognition makes our model the premier choice for privacy, legal, and financial applications.
๐ Recognized Categories
Our model identifies an extensive range of entities across multiple domains:
Personal Information
- NOME: First name of a person
- COGNOME: Last name of a person
- DATA_NASCITA: Date of birth
- DATA_MORTE: Date of death
- ETA: Age of a person
- CODICE_FISCALE: Italian tax code
- PROFESSIONE: Occupation or profession
- STATO_CIVILE: Civil status
Contact Information
- INDIRIZZO: Physical address
- NUMERO_TELEFONO: Phone number
- EMAIL: Email address
- CODICE_POSTALE: Postal code
Financial Information
- VALUTA: Currency
- IMPORTO: Monetary amount
- NUMERO_CARTA: Credit/debit card number
- CVV: Card security code
- NUMERO_CONTO: Bank account number
- IBAN: International bank account number
- BIC: Bank identifier code
- P_IVA: VAT number
- TASSO_MUTUO: Mortgage rate
- NUM_ASSEGNO_BANCARIO: Bank check number
- BANCA: Bank name
Legal Entities
- RAGIONE_SOCIALE: Company legal name
- TRIBUNALE: Court identifier
- LEGGE: Law reference
- N_SENTENZA: Sentence number
- N_LICENZA: License number
- AVV_NOTAIO: Lawyer or notary reference
- REGIME_PATRIMONIALE: Property regime
Medical Information
- CARTELLA_CLINICA: Medical record
- MALATTIA: Disease or medical condition
- MEDICINA: Medicine or medical treatment
- STORIA_CLINICA: Clinical history
- STRENGTH: Medicine strength
- FREQUENZA: Treatment frequency
- DURATION: Duration of treatment
- DOSAGGIO: Medicine dosage
- FORM: Medicine form (e.g., tablet)
Technical Information
- IP: IP address
- IPV6_1: IPv6 address
- MAC: MAC address
- USER_AGENT: Browser user agent
- IMEI: Mobile device identifier
Geographic and Temporal Data
- STATO: Country or nation
- LUOGO: Geographic location
- ORARIO: Specific time
- DATA: Generic date
Document and Vehicle Information
- NUMERO_DOCUMENTO: Document number
- TARGA_VEICOLO: Vehicle license plate
- FOGLIO: Document sheet reference
- PARTICELLA: Land registry particle
- MAPPALE: Land registry map reference
- SUBALTERNO: Land registry subordinate reference
Web and Security
- URL: Web address
- PASSWORD: Password
- PIN: Personal identification number
- BRAND: Commercial brand or trademark
๐ป Implementation
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
model = AutoModelForTokenClassification.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
# Example text
example = """Il commendatore Gianluigi Alberico De Laurentis-Ponti, con residenza legale in Corso Imperatrice 67,
Torino, avente codice fiscale DLNGGL60B01L219P, รจ amministratore delegato della "De Laurentis Advanced Engineering
Group S.p.A.", che si trova in Piazza Affari 32, Milano (MI); con una partita IVA di 09876543210, la societร รจ stata
recentemente incaricata di sviluppare una nuova linea di componenti aerospaziali per il progetto internazionale
di esplorazione di Marte."""
# Run NER
ner_results = nlp(example)
# Process results
for entity in ner_results:
print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")
๐ Use Cases
- Privacy Compliance: GDPR data mapping and PII detection
- Document Anonymization: Automated redaction of sensitive information
- Legal Document Analysis: Extraction of key entities from contracts and legal texts
- Financial Monitoring: Detection of financial entities for compliance and fraud prevention
- Medical Record Processing: Structured extraction from clinical notes and reports
๐ฎ Future Development
We're committed to continuous improvement of the model:
- Quarterly updates with further accuracy enhancements
- Expansion to include new entity types based on user feedback
- Development of domain-specific variants for specialized applications
- Integration of contextual entity linking capabilities
๐ฅ Contribution and Contact
Your feedback is essential to improving this model. If you're interested in contributing, have suggestions, or need a customized NER solution, please contact:
Michele Montebovi
Email: [email protected]
We welcome collaboration from the Italian NLP community to further enhance this tool and expand its applications across industries.
๐ Citation
If you use this model in your research or applications, please cite:
@misc{montebovi2025italiannerxxl,
author = {Montebovi, Michele},
title = {Italian\_NER\_XXL\_v2: A Comprehensive Named Entity Recognition Model for Italian},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/DeepMount00/Italian_NER_XXL_v2}}
}
- Downloads last month
- 4