Cascade0 DPO Instruct V2

DPOInstruct - Copy

(Trained only on 4.8B tokens) Experimental DPO Pass to see the difference. In some cases, during certain chats, the new DPO actually helps and makes the model feel more chat-y, sort of. For eg, when asking 'How to make a salad', it responds: image

Training info

Done in 2 weeks on a single RTX 4080, the raw pretraining only achieveing 40% of the max 12B tokens, due to potential overfill, reaching only 4.8B tokens.

Max context size is 1512.

SFT was done in two days, while DPO took only 1 day, on a RTX 3060M.

Other small models VS Cascade0-Series

All models are evaluated using LMEval Harness, on the same PC/Settings and GGUF with F16 Quant.

output (1)

made with LMEval Harness

Downloads last month
30
Safetensors
Model size
159M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925

Quantizations
1 model

Dataset used to train ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925

Collection including ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925