duyntnet commited on
Commit
3240f46
·
verified ·
1 Parent(s): 2516002

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - transformers
9
+ - gguf
10
+ - imatrix
11
+ - DeepSeek-R1-Distill-Llama-8B
12
+ ---
13
+ Quantizations of https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
14
+
15
+ ### Open source inference clients/UIs
16
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp)
17
+ * [KoboldCPP](https://github.com/LostRuins/koboldcpp)
18
+ * [ollama](https://github.com/ollama/ollama)
19
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
20
+ * [jan](https://github.com/janhq/jan)
21
+ * [GPT4All](https://github.com/nomic-ai/gpt4all)
22
+
23
+ ### Closed source inference clients/UIs
24
+ * [LM Studio](https://lmstudio.ai/)
25
+ * [Msty](https://msty.app/)
26
+ * [Backyard AI](https://backyard.ai/)
27
+
28
+ ---
29
+
30
+ # From original readme
31
+
32
+ We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
33
+ DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
34
+ With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
35
+ However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
36
+ we introduce DeepSeek-R1, which incorporates cold-start data before RL.
37
+ DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
38
+ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
39
+
40
+
41
+ ## How to Run Locally
42
+
43
+ ### DeepSeek-R1 Models
44
+
45
+ Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.
46
+
47
+ **NOTE: Hugging Face's Transformers has not been directly supported yet.**
48
+
49
+ ### DeepSeek-R1-Distill Models
50
+
51
+ DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
52
+
53
+ For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):
54
+
55
+ ```shell
56
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
57
+ ```
58
+
59
+ You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
60
+
61
+ ```bash
62
+ python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
63
+ ```
64
+
65
+ ### Usage Recommendations
66
+
67
+ **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
68
+
69
+ 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
70
+ 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
71
+ 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
72
+ 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
73
+
74
+ Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance.
75
+ **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**