collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9554
Num Input Tokens Seen: 29176432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.5082	0.0086	5	1.2044	253900
1.2297	0.0173	10	1.0969	505088
1.0669	0.0259	15	1.0475	759448
0.9161	0.0346	20	1.0276	1013980
0.6699	0.0432	25	1.0402	1264204
0.5472	0.0518	30	1.0525	1517512
0.4024	0.0605	35	1.0378	1770520
0.3633	0.0691	40	1.0346	2012376
0.3315	0.0777	45	1.0235	2266452
0.3059	0.0864	50	1.0148	2521528
0.3055	0.0950	55	1.0137	2775812
0.2359	0.1037	60	1.0030	3027688
0.3126	0.1123	65	1.0022	3283636
0.2403	0.1209	70	0.9972	3543064
0.3407	0.1296	75	0.9931	3792188
0.2366	0.1382	80	0.9914	4039528
0.2589	0.1469	85	0.9923	4292552
0.2351	0.1555	90	0.9896	4546400
0.1684	0.1641	95	0.9880	4795116
0.172	0.1728	100	0.9846	5046368
0.19	0.1814	105	0.9834	5308588
0.2633	0.1901	110	0.9820	5559684
0.3733	0.1987	115	0.9821	5813992
0.2145	0.2073	120	0.9801	6070416
0.2155	0.2160	125	0.9830	6315596
0.2225	0.2246	130	0.9814	6569088
0.2722	0.2332	135	0.9773	6818112
0.2117	0.2419	140	0.9763	7063572
0.2697	0.2505	145	0.9750	7313212
0.1635	0.2592	150	0.9748	7563296
0.2014	0.2678	155	0.9749	7814756
0.2966	0.2764	160	0.9732	8071104
0.2152	0.2851	165	0.9732	8321916
0.2225	0.2937	170	0.9732	8575656
0.218	0.3024	175	0.9725	8829960
0.2213	0.3110	180	0.9709	9077472
0.2019	0.3196	185	0.9735	9327976
0.2356	0.3283	190	0.9727	9586088
0.282	0.3369	195	0.9703	9836880
0.1755	0.3456	200	0.9711	10084064
0.1982	0.3542	205	0.9711	10342304
0.2235	0.3628	210	0.9694	10594072
0.2343	0.3715	215	0.9692	10848940
0.2224	0.3801	220	0.9675	11105216
0.1573	0.3887	225	0.9683	11357564
0.232	0.3974	230	0.9683	11608688
0.2024	0.4060	235	0.9656	11861536
0.206	0.4147	240	0.9658	12109876
0.2774	0.4233	245	0.9673	12358492
0.2034	0.4319	250	0.9673	12613104
0.2507	0.4406	255	0.9648	12866204
0.2835	0.4492	260	0.9661	13119448
0.2383	0.4579	265	0.9680	13367496
0.2672	0.4665	270	0.9667	13620264
0.1784	0.4751	275	0.9643	13878680
0.1693	0.4838	280	0.9653	14127536
0.2884	0.4924	285	0.9677	14381756
0.2109	0.5011	290	0.9638	14643852
0.1975	0.5097	295	0.9641	14897344
0.2218	0.5183	300	0.9651	15142956
0.2154	0.5270	305	0.9652	15392580
0.1529	0.5356	310	0.9634	15649732
0.1644	0.5442	315	0.9660	15899204
0.2834	0.5529	320	0.9646	16150936
0.1629	0.5615	325	0.9613	16395960
0.1851	0.5702	330	0.9612	16655372
0.2276	0.5788	335	0.9634	16915404
0.2364	0.5874	340	0.9615	17171280
0.3287	0.5961	345	0.9599	17430220
0.2272	0.6047	350	0.9587	17676740
0.1756	0.6134	355	0.9613	17926836
0.2325	0.6220	360	0.9615	18180824
0.2313	0.6306	365	0.9595	18430524
0.1806	0.6393	370	0.9590	18684524
0.212	0.6479	375	0.9587	18939748
0.145	0.6566	380	0.9590	19193300
0.1975	0.6652	385	0.9595	19440700
0.2746	0.6738	390	0.9604	19694592
0.299	0.6825	395	0.9587	19945404
0.1257	0.6911	400	0.9578	20196008
0.2559	0.6997	405	0.9581	20442928
0.2001	0.7084	410	0.9594	20695556
0.2035	0.7170	415	0.9589	20943484
0.1544	0.7257	420	0.9574	21196736
0.2173	0.7343	425	0.9579	21449560
0.1656	0.7429	430	0.9585	21702020
0.2824	0.7516	435	0.9593	21952844
0.1876	0.7602	440	0.9601	22205932
0.2108	0.7689	445	0.9585	22454488
0.2672	0.7775	450	0.9576	22704452
0.1782	0.7861	455	0.9559	22955940
0.2339	0.7948	460	0.9549	23207052
0.2428	0.8034	465	0.9558	23456708
0.2038	0.8121	470	0.9555	23709712
0.2188	0.8207	475	0.9556	23963108
0.149	0.8293	480	0.9567	24215948
0.1509	0.8380	485	0.9577	24471656
0.1932	0.8466	490	0.9582	24719948
0.1685	0.8552	495	0.9556	24965208
0.1658	0.8639	500	0.9560	25218600
0.2438	0.8725	505	0.9582	25476704
0.2235	0.8812	510	0.9572	25724700
0.1904	0.8898	515	0.9544	25973760
0.2485	0.8984	520	0.9546	26231120
0.2104	0.9071	525	0.9548	26480832
0.1977	0.9157	530	0.9575	26738864
0.2057	0.9244	535	0.9570	26997660
0.1918	0.9330	540	0.9548	27253932
0.1763	0.9416	545	0.9556	27508012
0.1706	0.9503	550	0.9588	27758020
0.2287	0.9589	555	0.9556	28012216
0.213	0.9676	560	0.9543	28270144
0.1938	0.9762	565	0.9555	28520404
0.2117	0.9848	570	0.9572	28774464
0.2136	0.9935	575	0.9559	29028248

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

Evaluation results