Cascade0 DPO Instruct V2

(Trained only on 4.8B tokens) Experimental DPO Pass to see the difference. In some cases, during certain chats, the new DPO actually helps and makes the model feel more chat-y, sort of. For eg, when asking 'How to make a salad', it responds:

Training info

Done in 2 weeks on a single RTX 4080, the raw pretraining only achieveing 40% of the max 12B tokens, due to potential overfill, reaching only 4.8B tokens.

Max context size is 1512.

SFT was done in two days, while DPO took only 1 day, on a RTX 3060M.

Other small models VS Cascade0-Series

All models are evaluated using LMEval Harness, on the same PC/Settings and GGUF with F16 Quant.

made with LMEval Harness

Downloads last month: 30

Safetensors

Model size

159M params

Tensor type

BF16

Model tree for ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925

Quantizations

1 model

Dataset used to train ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925

Collection including ARMZyany/Cascade0-159M-DPO-Instruct-V2-0925

Cascade0-Series

Collection

First Cascade Series. • 2 items • Updated 6 days ago