GGUF
draft
speculative-decoding
conversational

A 0.6B parameter draft (speculative decoding) model for use with DeepSeek-V3-0324 and DeepSeek-V3.

See DeepSeek-V3-DRAFT-0.6B-v2.0 for the models in transformers format, and a detailed explanation of how the model was created.


I've included the Q4_0 quants for 5 different context lengths:


NOTES:

  • The 14 heads of Qwen2.5-0.5B doesn't allow for any of the other 4-bit quants to be made (and experimentation has shown using more or less than 4-bits for speculative decoding is a waste of time anwyay).
  • Due to llama.cpp using "static-YaRN" the scaling factor remains constant regardless of input length! Only use the longer context versions when processing long contexts is required...
Downloads last month
22
GGUF
Model size
590M params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jukofyork/DeepSeek-V3-DRAFT-0.6B-v2.0-GGUF

Base model

Qwen/Qwen2.5-0.5B
Quantized
(58)
this model

Datasets used to train jukofyork/DeepSeek-V3-DRAFT-0.6B-v2.0-GGUF

Collection including jukofyork/DeepSeek-V3-DRAFT-0.6B-v2.0-GGUF