distily_bench_obj_cross_v2.12_gpt2
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 665.9925
- eval_frwikippl: 995.4457
- eval_zhwikippl: 405.3946
- eval_tinystoriesppl: 1100.5725
- eval_loss: 1.3024
- eval_runtime: 12.5753
- eval_samples_per_second: 47.713
- eval_steps_per_second: 11.928
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 3.9293 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
---|---|---|---|---|---|---|---|---|---|
teacher eval | 270.2348 | 76.8142 | 671.1238 | 22.8030 | |||||
0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.6652 | 47.374 | 11.843 | 74.6838 | 6171058503680.0 |
1500 | 0.0253 | 1012.5726 | 4501.9321 | 2.2064 | 12.5479 | 47.817 | 11.954 | 1084.7205 | 39061.2969 |
3000 | 0.0505 | 761.3547 | 2880.7776 | 1.7218 | 12.6141 | 47.566 | 11.891 | 932.5889 | 1552.8525 |
4500 | 0.0758 | 682.1792 | 1444.0309 | 1.5343 | 12.6458 | 47.447 | 11.862 | 963.2644 | 421.1599 |
6000 | 0.1010 | 673.6849 | 1216.2458 | 1.4424 | 12.6927 | 47.271 | 11.818 | 1035.7787 | 983.8034 |
7500 | 0.1263 | 630.5226 | 924.8793 | 1.3688 | 12.561 | 47.767 | 11.942 | 971.2607 | 351.8923 |
9000 | 0.1515 | 665.9925 | 995.4457 | 1.3024 | 12.5753 | 47.713 | 11.928 | 1100.5725 | 405.3946 |
10500 | 0.1768 | 649.4595 | 870.4929 | 1.2363 | 12.5912 | 47.652 | 11.913 | 1147.8689 | 379.8699 |
12000 | 0.2020 | 552.0709 | 756.2815 | 1.1687 | 12.5514 | 47.804 | 11.951 | 915.4786 | 247.3208 |
13500 | 0.2273 | 574.5076 | 775.2103 | 1.1446 | 12.6584 | 47.399 | 11.85 | 1022.3383 | 258.0553 |
15000 | 0.2525 | 570.0630 | 872.7639 | 1.1033 | 12.573 | 47.721 | 11.93 | 1034.7090 | 205.1337 |
16500 | 0.2778 | 524.1483 | 695.0405 | 1.0708 | 12.5445 | 47.83 | 11.957 | 960.6801 | 179.8155 |
18000 | 0.3030 | 558.0261 | 722.4153 | 1.0562 | 12.6414 | 47.463 | 11.866 | 1092.5500 | 238.2534 |
19500 | 0.3283 | 535.8491 | 646.8846 | 1.0133 | 12.5343 | 47.869 | 11.967 | 1038.2650 | 224.3871 |
21000 | 0.3535 | 498.7090 | 643.3860 | 0.9866 | 12.6044 | 47.602 | 11.901 | 945.8655 | 325.0199 |
22500 | 0.3788 | 501.5469 | 612.7169 | 0.9680 | 12.5367 | 47.86 | 11.965 | 979.3635 | 253.6864 |
24000 | 0.4040 | 376.6320 | 629.0483 | 0.9542 | 12.5557 | 47.787 | 11.947 | 639.3351 | 209.0216 |
25500 | 0.4293 | 481.3532 | 705.2970 | 0.9196 | 12.6849 | 47.3 | 11.825 | 966.3749 | 375.7875 |
27000 | 0.4545 | 459.1099 | 522.3182 | 0.8577 | 12.5747 | 47.715 | 11.929 | 958.1420 | 189.4054 |
28500 | 0.4798 | 413.4502 | 431.4271 | 0.7560 | 12.5416 | 47.841 | 11.96 | 891.3210 | 176.5119 |
30000 | 0.5051 | 403.5616 | 415.3713 | 0.7195 | 12.548 | 47.817 | 11.954 | 882.3771 | 152.6556 |
31500 | 0.5303 | 406.3142 | 383.7035 | 0.7008 | 12.7238 | 47.156 | 11.789 | 912.3057 | 155.9905 |
33000 | 0.5556 | 424.4844 | 373.8076 | 0.6957 | 12.5614 | 47.765 | 11.941 | 974.8803 | 171.0759 |
34500 | 0.5808 | 403.1555 | 398.5213 | 0.6867 | 12.5658 | 47.748 | 11.937 | 913.2111 | 178.8704 |
36000 | 0.6061 | 399.7424 | 356.4906 | 0.6771 | 12.5757 | 47.711 | 11.928 | 904.7578 | 169.4632 |
37500 | 0.6313 | 398.5905 | 372.6379 | 0.6750 | 12.652 | 47.423 | 11.856 | 912.7961 | 158.8251 |
39000 | 0.6566 | 392.1436 | 371.0796 | 0.6723 | 12.6742 | 47.34 | 11.835 | 882.8148 | 176.4061 |
40500 | 0.6818 | 393.4750 | 371.6812 | 0.6672 | 12.6703 | 47.355 | 11.839 | 901.9575 | 134.3779 |
42000 | 0.7071 | 399.2395 | 357.3452 | 0.6651 | 12.6545 | 47.414 | 11.853 | 913.0604 | 135.6295 |
43500 | 0.7323 | 391.1350 | 370.6879 | 0.6558 | 12.6748 | 47.338 | 11.834 | 896.4939 | 156.0113 |
45000 | 0.7576 | 382.1500 | 345.0898 | 0.6354 | 12.6893 | 47.284 | 11.821 | 884.7507 | 140.7350 |
46500 | 0.7828 | 379.9360 | 334.1126 | 0.6281 | 12.6503 | 47.43 | 11.857 | 877.5396 | 127.1069 |
48000 | 0.8081 | 379.3625 | 342.2339 | 0.6241 | 12.6749 | 47.338 | 11.834 | 882.8514 | 128.6507 |
49500 | 0.8333 | 379.1130 | 333.6659 | 0.6222 | 12.6951 | 47.262 | 11.816 | 881.2473 | 125.1969 |
51000 | 0.8586 | 378.2769 | 332.6569 | 0.6217 | 12.6252 | 47.524 | 11.881 | 883.0703 | 128.0856 |
52500 | 0.8838 | 377.0043 | 335.4331 | 0.6182 | 12.6655 | 47.373 | 11.843 | 880.3371 | 128.4364 |
54000 | 0.9091 | 376.5811 | 333.1023 | 0.6165 | 12.6459 | 47.446 | 11.862 | 877.0681 | 129.0633 |
55500 | 0.9343 | 377.9547 | 333.2431 | 0.6157 | 12.6412 | 47.464 | 11.866 | 883.1432 | 127.1832 |
57000 | 0.9596 | 378.2183 | 332.4462 | 0.6147 | 12.6477 | 47.439 | 11.86 | 884.0200 | 126.3209 |
58500 | 0.9848 | 377.9839 | 333.1023 | 0.6146 | 12.6522 | 47.422 | 11.856 | 883.7274 | 126.2198 |
59400 | 1.0 | 378.0425 | 333.0085 | 0.6147 | 12.651 | 47.427 | 11.857 | 883.7274 | 126.2198 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for lapp0/distily_bench_obj_cross_v2.12_gpt2
Base model
openai-community/gpt2