ModernCamemBERT

ModernCamemBERT is a French language model pretrained on a large corpus of 1T tokens of High-Quality French text. It is the French version of the ModernBERT model. ModernCamemBERT was trained using the Masked Language Modeling (MLM) objective with 30% mask rate on 1T tokens on 48 H100 GPUs. The dataset used for training is a combination of French RedPajama-V2 filtered using heuristic and semantic filtering, French scientific documents from HALvest, and the French Wikipedia. Semantic filtering was done by fine-tuning a BERT classifier trained on a document quality dataset automatically labeled by LLama-3 70B. We also re-use the old CamemBERTav2 tokenizer. The model was first trained with 1024 context length which was then increased to 8192 tokens later in the pretraining. More details about the training process can be found in the ModernCamemBERT paper.

The goal of ModernCamemBERT was to run a controlled study by pretraining ModernBERT on the same dataset as CamemBERTaV2, a DeBERTaV3 French model, isolating the effect of model design. Our results show that the previous model generation remains superior in sample efficiency and overall benchmark performance, with ModernBERT’s primary advantage being faster training and inference speed. However, the new proposed model still provides meaningful architectural improvements compared to earlier models such as the BERT and RoBERTa CamemBERT/v2 model. Additionally, we observe that high-quality pre-training data accelerates convergence but does not significantly improve final performance, suggesting potential benchmark saturation.

We recommend using the ModernCamemBERT model for tasks that require a large context length or efficient inference speed. Other tasks should still use the CamemBERTaV2 model, which is still the best performing model on most benchmarks.

We release two versions of the model: almanach/moderncamembert-base and almanach/moderncamembert-cv2-base. The first version is the one trained on the new high-quality 1T token dataset, while the second one is the one trained on the old CamemBERTaV2 dataset. The two models are trained with the same architecture and hyperparameters.

How to use

from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM

model = AutoModel.from_pretrained("almanach/moderncamembert-base")
tokenizer = AutoTokenizer.from_pretrained("almanach/moderncamembert-base")

Fine-tuning Results:

Datasets: NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), the French Question Answering Dataset (FQuAD).

Model	FTB-NER	CLS	PAWS-X	XNLI	F1 (FQuAD)	EM (FQuAD)
CamemBERT	89.97	94.62	91.36	81.95	80.98	62.51
CamemBERTa	90.33	94.92	91.67	82.00	81.15	62.01
CamemBERTv2	81.99	95.07	92.00	81.75	80.98	61.35
CamemBERTav2	93.40	95.63	93.06	84.82	83.04	64.29
ModernCamemBERT-CV2	92.17	94.86	92.71	82.85	81.68	62.00
ModernCamemBERT	91.33	94.92	92.52	83.62	82.19	62.66

Finetuned models are available in the following collection: ModernCamembert Models

Pretraining Codebase

We use the pretraining codebase from the ModernBERT repository for all ModernCamemBERT models.

Citation

@misc{antoun2025modernbertdebertav3examiningarchitecture,
      title={ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance}, 
      author={Wissam Antoun and Benoît Sagot and Djamé Seddah},
      year={2025},
      eprint={2504.08716},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.08716}, 
}

almanach
/

moderncamembert-base

ModernCamemBERT

How to use

Fine-tuning Results:

Pretraining Codebase

Citation

Datasets used to train almanach/moderncamembert-base

Collection including almanach/moderncamembert-base

ModernCamemBERT