ThomasBaruzier
/

DeepScaleR-1.5B-Preview-GGUF

GGUF

conversational

Model card Files Files and versions Community

ThomasBaruzier commited on Feb 27

Commit

e9ca865

verified ·

1 Parent(s): c194c4a

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -132

README.md CHANGED Viewed

@@ -1,132 +0,0 @@
----
-license: mit
-train: false
-inference: true
-pipeline_tag: text-generation
-base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
----
-<br><img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/FHc3IG1KAJn6N3s1TJLrS.webp" width="720"><br>
-# Llama.cpp imatrix quantizations of [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1)
-Using llama.cpp commit [3ad5451](https://github.com/ggerganov/llama.cpp/commit/3ad5451) for quantization.
-All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
-<hr>
-# Perplexity table (the lower the better)
-| Quant                                                                                                                                                  | Size (MB) | PPL     | Size (%) | Accuracy (%) | PPL error rate |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | ------- | -------- | ------------ | -------------- |
-| [IQ1_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ1_S.gguf)     | 489       | 88.4250 | 14.40    | 23.35        | 1.76           |
-| [IQ1_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ1_M.gguf)     | 516       | 53.8278 | 15.19    | 38.35        | 1.03           |
-| [IQ2_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ2_XXS.gguf) | 560       | 45.5693 | 16.49    | 45.31        | 0.93           |
-| [IQ2_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ2_XS.gguf)   | 598       | 32.6813 | 17.61    | 63.17        | 0.62           |
-| [IQ2_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ2_S.gguf)     | 633       | 28.5477 | 18.64    | 72.32        | 0.54           |
-| [IQ2_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ2_M.gguf)     | 669       | 31.8272 | 19.70    | 64.87        | 0.63           |
-| [Q2_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q2_K_S.gguf)   | 683       | 28.7707 | 20.11    | 71.76        | 0.54           |
-| [Q2_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q2_K.gguf)       | 718       | 27.6342 | 21.14    | 74.71        | 0.51           |
-| [IQ3_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ3_XXS.gguf) | 733       | 23.5511 | 21.58    | 87.66        | 0.44           |
-| [IQ3_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ3_XS.gguf)   | 793       | 22.9887 | 23.35    | 89.81        | 0.42           |
-| [Q3_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q3_K_S.gguf)   | 821       | 28.0462 | 24.17    | 73.61        | 0.53           |
-| [IQ3_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ3_S.gguf)     | 822       | 22.9268 | 24.20    | 90.05        | 0.42           |
-| [IQ3_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ3_M.gguf)     | 836       | 22.3167 | 24.62    | 92.51        | 0.41           |
-| [Q3_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q3_K_M.gguf)   | 881       | 22.5727 | 25.94    | 91.46        | 0.41           |
-| [Q3_K_L](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q3_K_L.gguf)   | 935       | 22.3758 | 27.53    | 92.27        | 0.41           |
-| [IQ4_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ4_XS.gguf)   | 972       | 21.3273 | 28.62    | 96.80        | 0.38           |
-| [IQ4_NL](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-IQ4_NL.gguf)   | 1018      | 21.3234 | 29.98    | 96.82        | 0.38           |
-| [Q4_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q4_0.gguf)       | 1019      | 22.5210 | 30.00    | 91.67        | 0.41           |
-| [Q4_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q4_K_S.gguf)   | 1022      | 21.1717 | 30.09    | 97.51        | 0.38           |
-| [Q4_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q4_K_M.gguf)   | 1065      | 21.0532 | 31.36    | 98.06        | 0.38           |
-| [Q4_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q4_1.gguf)       | 1109      | 21.1492 | 32.66    | 97.62        | 0.38           |
-| [Q5_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q5_K_S.gguf)   | 1201      | 20.7883 | 35.37    | 99.31        | 0.37           |
-| [Q5_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q5_0.gguf)       | 1203      | 20.8643 | 35.42    | 98.95        | 0.37           |
-| [Q5_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q5_K_M.gguf)   | 1226      | 20.7488 | 36.10    | 99.50        | 0.37           |
-| [Q5_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q5_1.gguf)       | 1293      | 20.7773 | 38.07    | 99.37        | 0.37           |
-| [Q6_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q6_K.gguf)       | 1396      | 20.6994 | 41.11    | 99.74        | 0.37           |
-| [Q8_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-Q8_0.gguf)       | 1807      | 20.6659 | 53.21    | 99.90        | 0.37           |
-| [F16](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-F16.gguf)         | 3396      | 20.6457 | 100      | 100          | 0.37           |
-<hr>
----
-license: mit
-train: false
-inference: true
-pipeline_tag: text-generation
-base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
----
-This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> model re-distilled for better performance.
-## Performance
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a> |
-|:-------------------:|:--------:|:----------------:|
-| ARC (25-shot)      | 40.96 | <b>41.55</b>  |
-| HellaSwag (10-shot)| 44    | <b>45.88</b> |
-| MMLU (5-shot)      | 39.27 | <b>41.82</b> |
-| TruthfulQA-MC2     | 45.17 | <b>46.63</b> |
-| Winogrande (5-shot)| 55.49 | <b>57.7</b> |
-| GSM8K (5-shot)     | 69.9  | <b>74.3</b> |
-| Average            | 49.13 | <b>51.31</b> |
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a>  |
-|:-------------------:|:--------:|:----------------:|
-| GPQA (0-shot)     | 26.96 | <b>26.99</b>  |
-| MMLU PRO (5-shot) | 16.74 | <b>19.86</b> |
-| MUSR (0-shot)     | 35.93 | <b>36.6</b> |
-| BBH (3-shot)      | 35.12 | <b>37.23</b> |
-| IfEval (0-shot)   | 24.94 | <b>27.22</b> |
-## Usage
-```Python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-compute_dtype = torch.bfloat16
-device   = 'cuda'
-model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1"
-model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-prompt  = "What is 1.5+102.2?"
-chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
-outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True)
-print(tokenizer.decode(outputs[0]))
-```
-Output:
-```
-<｜begin▁of▁sentence｜><｜User｜>What is 1.5+102.2?<｜Assistant｜><think>
-First, I identify the numbers involved in the addition: 1.5 and 102.2.
-Next, I add the whole numbers: 1 + 102 equals 103.
-Then, I add the decimal parts: 0.5 + 0.2 equals 0.7.
-Finally, I combine the results: 103 + 0.7 equals 103.7.
-</think>
-To solve the addition \(1.5 + 102.2\), follow these steps:
-1. **Add the whole numbers:**
-   \[
-   1 + 102 = 103
-   \]
-2. **Add the decimal parts:**
-   \[
-   0.5 + 0.2 = 0.7
-   \]
-3. **Combine the results:**
-   \[
-   103 + 0.7 = 103.7
-   \]
-So, the final answer is \(\boxed{103.7}\).<｜end▁of▁sentence｜>
-```