Model Card AI-Sweden-Models/ModernBERT-large

We are continuing the pretraining
of ModernBERT-large on 1.2T Scandinavian tokens,
currently at token 201 220 456 072 (16.8%).

You can play around with the current model checkpoint,
but expect downstream performance to increase
as we get closer to the finish line :)
[token=201220456072/1198510347252]:
         Train trainer/packing_efficiency: 0.9987
         Train time/batch: 96049
         Train time/sample: 290088647
         Train time/batch_in_epoch: 96049
         Train time/sample_in_epoch: 290088647
         Train time/token: 201218360672
         Train time/token_in_epoch: 201218360672
         Train trainer/device_train_microbatch_size: 2
         Train loss/train/total: 0.9735
         Train throughput/batches_per_sec: 0.4429
         Train throughput/samples_per_sec: 1336.2411
         Train throughput/device/batches_per_sec: 0.0138
         Train throughput/device/samples_per_sec: 41.7575
         Train throughput/tokens_per_sec: 928047.0762
         Train throughput/device/tokens_per_sec: 29001.4711
         Train time/train: 99.9818
         Train time/val: 0.0000
         Train time/total: 99.9818
         Train lr-StableAdamW/group0: 0.0002
         Train lr-StableAdamW/group1: 0.0002
         Train gradient_norms/l1_norm: 1775.0997
         Train gradient_norms/l2_norm: 0.2097
Downloads last month
19
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AI-Sweden-Models/ModernBERT-large

Finetuned
(115)
this model