dhivehi-nougat-small-text-sen-multiline

This model is a fine-tuned version of facebook/nougat-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5761

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 18
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
4.9984 0.1915 500 0.7656
4.4373 0.3830 1000 0.6860
4.1498 0.5746 1500 0.6523
4.0507 0.7661 2000 0.6342
4.04 0.9576 2500 0.6209
3.9678 1.1490 3000 0.6135
3.9341 1.3405 3500 0.6075
3.8322 1.5320 4000 0.6028
3.864 1.7236 4500 0.6002
3.8866 1.9151 5000 0.5972
3.8513 2.1065 5500 0.5944
3.7987 2.2980 6000 0.5930
3.85 2.4895 6500 0.5911
3.7525 2.6811 7000 0.5901
3.8161 2.8726 7500 0.5884
3.774 3.0640 8000 0.5879
3.8582 3.2555 8500 0.5869
3.8049 3.4470 9000 0.5855
3.7704 3.6385 9500 0.5855
3.8524 3.8301 10000 0.5844
3.7806 4.0215 10500 0.5839
3.7578 4.2130 11000 0.5834
3.7702 4.4045 11500 0.5829
3.7018 4.5960 12000 0.5827
3.7466 4.7875 12500 0.5816
3.7695 4.9791 13000 0.5816
3.8066 5.1705 13500 0.5811
3.7632 5.3620 14000 0.5813
3.7906 5.5535 14500 0.5801
3.7567 5.7450 15000 0.5805
3.7465 5.9365 15500 0.5802
3.7318 6.1279 16000 0.5797
3.7349 6.3195 16500 0.5792
3.724 6.5110 17000 0.5795
3.7208 6.7025 17500 0.5793
3.7877 6.8940 18000 0.5788
3.8067 7.0854 18500 0.5788
3.7721 7.2769 19000 0.5782
3.7535 7.4685 19500 0.5781
3.7339 7.6600 20000 0.5778
3.7472 7.8515 20500 0.5784
3.7907 8.0429 21000 0.5780
3.7457 8.2344 21500 0.5778
3.7464 8.4259 22000 0.5777
3.7859 8.6175 22500 0.5771
3.7792 8.8090 23000 0.5775
3.4678 9.0004 23500 0.5773
3.734 9.1919 24000 0.5769
3.7741 9.3834 24500 0.5770
3.8595 9.5749 25000 0.5766
3.7799 9.7665 25500 0.5767
3.6788 9.9580 26000 0.5768
3.7228 10.1494 26500 0.5766
3.7604 10.3409 27000 0.5763
3.7169 10.5324 27500 0.5765
3.731 10.7240 28000 0.5765
3.7575 10.9155 28500 0.5760
3.9147 11.1069 29000 0.5759
3.6776 11.2984 29500 0.5762
3.7124 11.4899 30000 0.5763
3.7571 11.6814 30500 0.5761

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
6
Safetensors
Model size
247M params
Tensor type
I64
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/dhivehi-nougat-small-text-sen-multiline

Finetuned
(4)
this model

Collection including alakxender/dhivehi-nougat-small-text-sen-multiline