Snyhlxde/shiftedattn-10-23-7b-qwen2p5-coder-n16w16-distilln32w16-ar-1-cyclic-noise-all-1e-6 Updated 14 days ago
Snyhlxde/shiftedattn-10-16-7b-qwen2p5-coder-n32w16-n16distill-data-v2-ar-1-cyclic-noise-all-1e-6 Updated 28 days ago
Snyhlxde/aligned-8-24-7B-ntok64_ce_soft_loss_length_20k_flexattn_data_v1_4k_smp_ar_10_lr5-6 8B • Updated Aug 24 • 1
Snyhlxde/aligned-8-24-7B-ntok64_ce_soft_loss_length_20k_flexattn_data_v1_4k_smp_ar_1_only_lr5-6 8B • Updated Aug 24 • 1
Snyhlxde/8-23-openthinker2-7B-ntok64_ce_soft_loss_length_capped_16k_flexattn_data_v1_4kexample_ar_10 8B • Updated Aug 24 • 1
Snyhlxde/8-23-openthinker2-7B-ntok64_ce_soft_loss_length_capped_16k_flexattn_data_v1_4kexample_ar_only 8B • Updated Aug 24 • 4
Snyhlxde/openthinker2-7B-ntok64_soft_ce_loss_length_20k_flexattn_data_v1_32k_sample_ar_10_c_d 8B • Updated Aug 22 • 1
Snyhlxde/openthinker2-7B-ntok64_soft_ce_loss_length_20k_flexattn_data_v1_4k_sample_ar_10_c_d 8B • Updated Aug 21 • 1
Snyhlxde/soft_loss_length_capped_16k_flexattn_data_v1_4k4samples_8_16_no_prompt_boundary Updated Aug 17 • 1