cgus
/

Mistral-Small-3.1-DRAFT-0.5B-exl2

Text Generation

mistral-small-3.1

4-bit precision

Model card Files Files and versions Community

cgus commited on Mar 24

Commit

d7fc577

·

verified ·

1 Parent(s): 2457489

Update README.md

Files changed (1) hide show

README.md +17 -2

README.md CHANGED Viewed

@@ -8,11 +8,11 @@ language:
 - it
 - pt
 base_model:
-- alamios/Qwenstral-Small-3.1-0.5B
 datasets:
 - alamios/Mistral-Small-24B-Instruct-2501-Conversations
 pipeline_tag: text-generation
-library_name: transformers
 tags:
 - qwen
 - qwen2.5
@@ -20,7 +20,22 @@ tags:
 - mistral-small
 - mistral-small-3.1
 ---
 # Mistral-Small-3.1-DRAFT-0.5B
 This model is meant to be used as draft model for speculative decoding with [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) or [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)

 - it
 - pt
 base_model:
+- alamios/Mistral-Small-3.1-DRAFT-0.5B
 datasets:
 - alamios/Mistral-Small-24B-Instruct-2501-Conversations
 pipeline_tag: text-generation
+library_name: exllamav2
 tags:
 - qwen
 - qwen2.5
 - mistral-small
 - mistral-small-3.1
 ---
+# Mistral-Small-3.1-DRAFT-0.5B-exl2
+Original model: [Mistral-Small-3.1-DRAFT-0.5B](https://huggingface.co/alamios/Mistral-Small-3.1-DRAFT-0.5B) by [alamios](https://huggingface.co/alamios)
+Based on: [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) by [Qwen](https://huggingface.co/Qwen/Qwen2.5-0.5B)
+## Quants
+[4bpw h6 (main)](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/main)
+[5bpw h6](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/5bpw-h6)
+[6bpw h6](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/6bpw-h6)
+[8bpw h8](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/8bpw-h8)
+## Quantization notes
+Made with Exllamav2 with default dataset.
+These quants are meant to be used as a draft model for TabbyAPI.
+8bpw version with FP16 cache probably might be the most reliable option for this purpose.
+## Original model card
 # Mistral-Small-3.1-DRAFT-0.5B
 This model is meant to be used as draft model for speculative decoding with [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) or [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)