BluebrainAI
/

rotating-head-gp-norm-gpt2-medium-wikitext

Feature Extraction

rotating-head-gpt2

Generated from Trainer

Model card Files Files and versions Community

rotating-head-gp-norm-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.2113
Accuracy: 0.4180
Perplexity: 24.8108
Bleu: 0.1307

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Perplexity	Bleu
5.9057	0.2806	500	5.7484	0.2234	313.6789	0.0477
4.8613	0.5612	1000	4.7455	0.2807	115.0632	0.0711
4.2976	0.8418	1500	4.2220	0.3187	68.1694	0.0837
3.9568	1.1223	2000	3.9271	0.3461	50.7582	0.0934
3.7919	1.4029	2500	3.7617	0.3626	43.0211	0.0942
3.692	1.6835	3000	3.6573	0.3725	38.7561	0.1052
3.5939	1.9641	3500	3.5628	0.3818	35.2616	0.1094
3.483	2.2447	4000	3.4932	0.3879	32.8924	0.1140
3.4251	2.5253	4500	3.4391	0.3933	31.1583	0.1204
3.3876	2.8058	5000	3.3855	0.3991	29.5323	0.1227
3.2719	3.0864	5500	3.3499	0.4020	28.5004	0.1246
3.2612	3.3670	6000	3.3160	0.4062	27.5488	0.1283
3.2373	3.6476	6500	3.2848	0.4095	26.7034	0.1288
3.2086	3.9282	7000	3.2598	0.4118	26.0453	0.1297
3.1402	4.2088	7500	3.2398	0.4146	25.5281	0.1344
3.1002	4.4893	8000	3.2246	0.4162	25.1447	0.1317
3.1099	4.7699	8500	3.2113	0.4180	24.8108	0.1307

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0

Downloads last month: 2

Safetensors

Model size

355M params

Tensor type

F32

·

Inference Providers NEW

Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Metadata error: specify a dataset to view leaderboard