Moderncamembert-4entities

Model Description

We present Moderncamembert-4entities, which is a Moderncamembert-cv2-base fine-tuned for the Name Entity Recognition task for the French language on four French NER datasets for 4 entities (LOC, PER, ORG, MISC).
All these datasets were concatenated and cleaned into a single dataset that we called frenchNER_4entities.
There are a total of 384,773 rows, of which 328,757 are for training, 24,131 for validation and 31,885 for testing.

Evaluation results

The evaluation was carried out using the evaluate python package.

frenchNER_4entities

For space reasons, we show only the F1 of the different models. You can see the full results below the table.

Model	Parameters	Context	PER	LOC	ORG	MISC
Jean-Baptiste/camembert-ner	110M	512 tokens	0.971	0.947	0.902	0.663
cmarkea/distilcamembert-base-ner	67.5M	512 tokens	0.974	0.948	0.892	0.658
NERmembert-base-4entities	110M	512 tokens	0.978	0.958	0.903	0.814
NERmembert2-4entities	111M	1024 tokens	0.978	0.958	0.901	0.806
NERmemberta-4entities	111M	1024 tokens	0.979	0.961	0.915	0.812
Moderncamembert-4entities (this model)	136M	8192 tokens	0.981	0.960	0.913	0.811
NERmembert-large-4entities	336M	512 tokens	0.982	0.964	0.919	0.834

Full results


{'LOC': {'precision': 0.9565485362095532,

  'recall': 0.9639751552795031,

  'f1': 0.9602474864655839,

  'number': 54740},

 'MISC': {'precision': 0.8599987367357251,

  'recall': 0.7680873268834796,

  'f1': 0.8114486642728371,

  'number': 35453},

 'O': {'precision': 0.9908647492910065,

  'recall': 0.9941133167897094,

  'f1': 0.9924863747765278,

  'number': 805547},

 'ORG': {'precision': 0.9089921444091593,

  'recall': 0.9175031632222691,

  'f1': 0.913227824188741,

  'number': 11855},

 'PER': {'precision': 0.97616260010303,

  'recall': 0.9855785143505603,

  'f1': 0.9808479600959955,

  'number': 63447},

 'overall_precision': 0.9826691327460604,

 'overall_recall': 0.9826691327460604,

 'overall_f1': 0.9826691327460604,

 'overall_accuracy': 0.9826691327460604}

Usage

from transformers import pipeline

ner = pipeline('token-classification', model='CATIE-AQ/Moderncamembert_4entities', tokenizer='CATIE-AQ/Moderncamembert_4entities', aggregation_strategy="simple")

result = ner(
"Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques."
)

print(result)

Environmental Impact

Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

Hardware Type: A100 PCIe 40/80GB
Hours used: 2h48min
Cloud Provider: Private Infrastructure
Carbon Efficiency (kg/kWh): 0.032 (estimated from electricitymaps for the day of April 15, 2025.)
Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): 0.022 kg eq. CO2

Citations