Fine-tuned/hyperfitted with methodology from https://arxiv.org/abs/2412.04318

Updated 23.02.2025: same dataset, 512 token sequences with 64 tokens sliding window (loss still decreased). Significant hellaswag drop (~22%)

Safetensors

Model size

14.8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pk11/Qwen2.5-14B-Instruct-1M-HF-GK

Base model

Qwen/Qwen2.5-14B

Finetuned

Finetuned

(26)

this model

Quantizations