Fill-Mask
Transformers
PyTorch
TensorBoard
Safetensors
French
modernbert
camembert

ModernCamemBERT

ModernCamemBERT is a French language model pretrained on a large corpus of 1T tokens of High-Quality French text. It is the French version of the ModernBERT model. ModernCamemBERT was trained using the Masked Language Modeling (MLM) objective with 30% mask rate on 1T tokens on 48 H100 GPUs. The dataset used for training is a combination of French RedPajama-V2 filtered using heuristic and semantic filtering, French scientific documents from HALvest, and the French Wikipedia. Semantic filtering was done by fine-tuning a BERT classifier trained on a document quality dataset automatically labeled by LLama-3 70B. We also re-use the old CamemBERTav2 tokenizer. The model was first trained with 1024 context length which was then increased to 8192 tokens later in the pretraining. More details about the training process can be found in the ModernCamemBERT paper.

The goal of ModernCamemBERT was to run a controlled study by pretraining ModernBERT on the same dataset as CamemBERTaV2, a DeBERTaV3 French model, isolating the effect of model design. Our results show that the previous model generation remains superior in sample efficiency and overall benchmark performance, with ModernBERT’s primary advantage being faster training and inference speed. However, the new proposed model still provides meaningful architectural improvements compared to earlier models such as the BERT and RoBERTa CamemBERT/v2 model. Additionally, we observe that high-quality pre-training data accelerates convergence but does not significantly improve final performance, suggesting potential benchmark saturation.

We recommend using the ModernCamemBERT model for tasks that require a large context length or efficient inference speed. Other tasks should still use the CamemBERTaV2 model, which is still the best performing model on most benchmarks.

We release two versions of the model: almanach/moderncamembert-base and almanach/moderncamembert-cv2-base. The first version is the one trained on the new high-quality 1T token dataset, while the second one is the one trained on the old CamemBERTaV2 dataset. The two models are trained with the same architecture and hyperparameters.

How to use

from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM

model = AutoModel.from_pretrained("almanach/moderncamembert-base")
tokenizer = AutoTokenizer.from_pretrained("almanach/moderncamembert-base")

Fine-tuning Results:

Datasets: NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), the French Question Answering Dataset (FQuAD).

Model FTB-NER CLS PAWS-X XNLI F1 (FQuAD) EM (FQuAD)
CamemBERT 89.97 94.62 91.36 81.95 80.98 62.51
CamemBERTa 90.33 94.92 91.67 82.00 81.15 62.01
CamemBERTv2 81.99 95.07 92.00 81.75 80.98 61.35
CamemBERTav2 93.40 95.63 93.06 84.82 83.04 64.29
ModernCamemBERT-CV2 92.17 94.86 92.71 82.85 81.68 62.00
ModernCamemBERT 91.33 94.92 92.52 83.62 82.19 62.66

Finetuned models are available in the following collection: ModernCamembert Models

Pretraining Codebase

We use the pretraining codebase from the ModernBERT repository for all ModernCamemBERT models.

Citation

@misc{antoun2025modernbertdebertav3examiningarchitecture,
      title={ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance}, 
      author={Wissam Antoun and Benoît Sagot and Djamé Seddah},
      year={2025},
      eprint={2504.08716},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.08716}, 
}
Downloads last month
0
Safetensors
Model size
136M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train almanach/moderncamembert-base

Collection including almanach/moderncamembert-base