Draft Models
Collection
Tiny "draft" models for speculative decoding.
•
24 items
•
Updated
•
1
A 0.6B
parameter draft (speculative decoding) model for use with DeepSeek-V3-0324 and DeepSeek-V3.
See DeepSeek-V3-DRAFT-0.6B-v2.0 for the models in transformers
format, and a detailed explanation of how the model was created.
I've included the Q4_0
quants for 5 different context lengths:
Qwen2.5-0.5B
doesn't allow for any of the other 4-bit quants to be made (and experimentation has shown using more or less than 4-bits for speculative decoding is a waste of time anwyay).llama.cpp
using "static-YaRN" the scaling factor remains constant regardless of input length! Only use the longer context versions when processing long contexts is required...4-bit
Base model
Qwen/Qwen2.5-0.5B