---
license: apache-2.0
language:
- it
- en
pipeline_tag: token-classification
tags:
- legal
- finance
- medical
- privacy
- named-entity-recognition
---

# Italian_NER_XXL_v2

## 🚀 Model Overview
Welcome to the second generation of our state-of-the-art Named Entity Recognition model for Italian text. Building on the success of our previous version, Italian_NER_XXL_v2 delivers significantly enhanced performance with an **accuracy of 87.5%** and **F1 score of 89.2%** - an improvement of over 8 percentage points from my previous model.

## 💡 Key Improvements
- **Enhanced Accuracy**: From 79% to 87.5%
- **Better Context Understanding**: Improved recognition of entities in complex sentences
- **Reduced False Positives**: More precise identification of sensitive information
- **Expanded Training Data**: Trained on a more diverse corpus of Italian text

## 🏆 Market Leadership
Italian_NER_XXL_v2 remains the only model in Italy capable of identifying a comprehensive range of **52** different entity categories, maintaining our unique position in the Italian NLP landscape. This unparalleled breadth of entity recognition makes our model the premier choice for privacy, legal, and financial applications.

## 🔬 Technical Foundation
The model builds upon the transformer-based architecture, specifically utilizing a fine-tuned BERT variant optimized for Italian language understanding. We've implemented advanced techniques including:

- Custom attention mechanisms for better contextual understanding
- Specialized token classification heads for each entity category
- Enhanced preprocessing pipeline for Italian text

## 📋 Recognized Categories
Our model identifies an extensive range of entities across multiple domains:

### Personal Information
- **NOME**: First name of a person
- **COGNOME**: Last name of a person
- **DATA_NASCITA**: Date of birth
- **DATA_MORTE**: Date of death
- **ETA**: Age of a person
- **CODICE_FISCALE**: Italian tax code
- **PROFESSIONE**: Occupation or profession
- **STATO_CIVILE**: Civil status

### Contact Information
- **INDIRIZZO**: Physical address
- **NUMERO_TELEFONO**: Phone number
- **EMAIL**: Email address
- **CODICE_POSTALE**: Postal code

### Financial Information
- **VALUTA**: Currency
- **IMPORTO**: Monetary amount
- **NUMERO_CARTA**: Credit/debit card number
- **CVV**: Card security code
- **NUMERO_CONTO**: Bank account number
- **IBAN**: International bank account number
- **BIC**: Bank identifier code
- **P_IVA**: VAT number
- **TASSO_MUTUO**: Mortgage rate
- **NUM_ASSEGNO_BANCARIO**: Bank check number
- **BANCA**: Bank name

### Legal Entities
- **RAGIONE_SOCIALE**: Company legal name
- **TRIBUNALE**: Court identifier
- **LEGGE**: Law reference
- **N_SENTENZA**: Sentence number
- **N_LICENZA**: License number
- **AVV_NOTAIO**: Lawyer or notary reference
- **REGIME_PATRIMONIALE**: Property regime

### Medical Information
- **CARTELLA_CLINICA**: Medical record
- **MALATTIA**: Disease or medical condition
- **MEDICINA**: Medicine or medical treatment
- **STORIA_CLINICA**: Clinical history
- **STRENGTH**: Medicine strength
- **FREQUENZA**: Treatment frequency
- **DURATION**: Duration of treatment
- **DOSAGGIO**: Medicine dosage
- **FORM**: Medicine form (e.g., tablet)

### Technical Information
- **IP**: IP address
- **IPV6_1**: IPv6 address
- **MAC**: MAC address
- **USER_AGENT**: Browser user agent
- **IMEI**: Mobile device identifier

### Geographic and Temporal Data
- **STATO**: Country or nation
- **LUOGO**: Geographic location
- **ORARIO**: Specific time
- **DATA**: Generic date

### Document and Vehicle Information
- **NUMERO_DOCUMENTO**: Document number
- **TARGA_VEICOLO**: Vehicle license plate
- **FOGLIO**: Document sheet reference
- **PARTICELLA**: Land registry particle
- **MAPPALE**: Land registry map reference
- **SUBALTERNO**: Land registry subordinate reference

### Web and Security
- **URL**: Web address
- **PASSWORD**: Password
- **PIN**: Personal identification number
- **BRAND**: Commercial brand or trademark

## 💻 Implementation

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
model = AutoModelForTokenClassification.from_pretrained("DeepMount00/Italian_NER_XXL_v2")

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Example text
example = """Il commendatore Gianluigi Alberico De Laurentis-Ponti, con residenza legale in Corso Imperatrice 67, 
Torino, avente codice fiscale DLNGGL60B01L219P, è amministratore delegato della "De Laurentis Advanced Engineering 
Group S.p.A.", che si trova in Piazza Affari 32, Milano (MI); con una partita IVA di 09876543210, la società è stata 
recentemente incaricata di sviluppare una nuova linea di componenti aerospaziali per il progetto internazionale 
di esplorazione di Marte."""

# Run NER
ner_results = nlp(example)

# Process results
for entity in ner_results:
    print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")
```

## 🚀 Use Cases
- **Privacy Compliance**: GDPR data mapping and PII detection
- **Document Anonymization**: Automated redaction of sensitive information
- **Legal Document Analysis**: Extraction of key entities from contracts and legal texts
- **Financial Monitoring**: Detection of financial entities for compliance and fraud prevention
- **Medical Record Processing**: Structured extraction from clinical notes and reports

## 🔮 Future Development
We're committed to continuous improvement of the model:
- Quarterly updates with further accuracy enhancements
- Expansion to include new entity types based on user feedback
- Development of domain-specific variants for specialized applications
- Integration of contextual entity linking capabilities

## 👥 Contribution and Contact
Your feedback is essential to improving this model. If you're interested in contributing, have suggestions, or need a customized NER solution, please contact:

Michele Montebovi  
Email: [montebovi.michele@gmail.com](mailto:montebovi.michele@gmail.com)

We welcome collaboration from the Italian NLP community to further enhance this tool and expand its applications across industries.

## 📝 Citation
If you use this model in your research or applications, please cite:

```bibtex
@misc{montebovi2025italiannerxxl,
  author = {Montebovi, Michele},
  title = {Italian\_NER\_XXL\_v2: A Comprehensive Named Entity Recognition Model for Italian},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/DeepMount00/Italian_NER_XXL_v2}}
}
```