Model Card AI-Sweden-Models/ModernBERT-large
We are continuing the pretraining
of ModernBERT-large on 1.2T Scandinavian tokens,
currently at token 201 220 456 072 (16.8%).
You can play around with the current model checkpoint,
but expect downstream performance to increase
as we get closer to the finish line :)
[token=201220456072/1198510347252]:
Train trainer/packing_efficiency: 0.9987
Train time/batch: 96049
Train time/sample: 290088647
Train time/batch_in_epoch: 96049
Train time/sample_in_epoch: 290088647
Train time/token: 201218360672
Train time/token_in_epoch: 201218360672
Train trainer/device_train_microbatch_size: 2
Train loss/train/total: 0.9735
Train throughput/batches_per_sec: 0.4429
Train throughput/samples_per_sec: 1336.2411
Train throughput/device/batches_per_sec: 0.0138
Train throughput/device/samples_per_sec: 41.7575
Train throughput/tokens_per_sec: 928047.0762
Train throughput/device/tokens_per_sec: 29001.4711
Train time/train: 99.9818
Train time/val: 0.0000
Train time/total: 99.9818
Train lr-StableAdamW/group0: 0.0002
Train lr-StableAdamW/group1: 0.0002
Train gradient_norms/l1_norm: 1775.0997
Train gradient_norms/l2_norm: 0.2097
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for AI-Sweden-Models/ModernBERT-large
Base model
answerdotai/ModernBERT-large