cgus commited on
Commit
d7fc577
·
verified ·
1 Parent(s): 2457489

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -2
README.md CHANGED
@@ -8,11 +8,11 @@ language:
8
  - it
9
  - pt
10
  base_model:
11
- - alamios/Qwenstral-Small-3.1-0.5B
12
  datasets:
13
  - alamios/Mistral-Small-24B-Instruct-2501-Conversations
14
  pipeline_tag: text-generation
15
- library_name: transformers
16
  tags:
17
  - qwen
18
  - qwen2.5
@@ -20,7 +20,22 @@ tags:
20
  - mistral-small
21
  - mistral-small-3.1
22
  ---
 
 
 
23
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  # Mistral-Small-3.1-DRAFT-0.5B
25
 
26
  This model is meant to be used as draft model for speculative decoding with [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) or [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)
 
8
  - it
9
  - pt
10
  base_model:
11
+ - alamios/Mistral-Small-3.1-DRAFT-0.5B
12
  datasets:
13
  - alamios/Mistral-Small-24B-Instruct-2501-Conversations
14
  pipeline_tag: text-generation
15
+ library_name: exllamav2
16
  tags:
17
  - qwen
18
  - qwen2.5
 
20
  - mistral-small
21
  - mistral-small-3.1
22
  ---
23
+ # Mistral-Small-3.1-DRAFT-0.5B-exl2
24
+ Original model: [Mistral-Small-3.1-DRAFT-0.5B](https://huggingface.co/alamios/Mistral-Small-3.1-DRAFT-0.5B) by [alamios](https://huggingface.co/alamios)
25
+ Based on: [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) by [Qwen](https://huggingface.co/Qwen/Qwen2.5-0.5B)
26
 
27
+ ## Quants
28
+ [4bpw h6 (main)](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/main)
29
+ [5bpw h6](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/5bpw-h6)
30
+ [6bpw h6](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/6bpw-h6)
31
+ [8bpw h8](https://huggingface.co/cgus/Mistral-Small-3.1-DRAFT-0.5B-exl2/tree/8bpw-h8)
32
+
33
+ ## Quantization notes
34
+ Made with Exllamav2 with default dataset.
35
+ These quants are meant to be used as a draft model for TabbyAPI.
36
+ 8bpw version with FP16 cache probably might be the most reliable option for this purpose.
37
+
38
+ ## Original model card
39
  # Mistral-Small-3.1-DRAFT-0.5B
40
 
41
  This model is meant to be used as draft model for speculative decoding with [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) or [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)