AndrewYan commited on
Commit
5fea8dc
·
verified ·
1 Parent(s): 9989bf2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -3
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ---
6
+ license: apache-2.0
7
+ ---
8
+
9
+ ## 📖 Introduction
10
+
11
+ # DistilQwen2.5-DS3-0324 Series: Fast-Thinking Reasoning Models
12
+
13
+ ## Overview
14
+ In response to the industry challenge of balancing efficient reasoning with cognitive capabilities, the DistilQwen2.5-DS3-0324 series innovatively transfers the fast-thinking capabilities of DeepSeekV3-0324 to lightweight models. Through a two-stage distillation framework, this series achieves high performance while delivering:
15
+ - **Enhanced Reasoning Speed**: Reduces output tokens by 60-80% (compared to slow-thinking models)
16
+ - **Reduced Resource Consumption**: Suitable for edge computing deployment
17
+ - **Elimination of Cognitive Bias**: Proprietary trajectory alignment technology
18
+
19
+ ## Core Innovations
20
+ ### 1. Fast-Thinking Distillation Framework
21
+ - **Stage 1: Fast-Thinking CoT Data Collection**
22
+ - **Long-to-Short Rewriting**: Extracts key reasoning steps from DeepSeek-R1
23
+ - **Teacher Model Distillation**: Captures the rapid reasoning trajectories of DeepSeekV3-0324
24
+
25
+ - **Stage 2: CoT Trajectory Cognitive Alignment**
26
+ - **Dynamic Difficulty Grading** (Easy/Medium/Hard)
27
+ - LLM-as-a-Judge evaluates small model comprehensibility
28
+ - Simple chain expansion → Adds necessary steps
29
+ - Hard chain simplification → Removes high-level logical leaps
30
+ - **Validation Mechanism**: Iterative optimization until all data reaches "Medium" rating
31
+
32
+ ### 2. Performance Breakthroughs
33
+ - **32B Model** approaches the performance of closed-source models with 10x the parameters on the GPQA Diamond benchmark
34
+ - **Significant Improvement in Reasoning Efficiency** (see comparison table below)
35
+
36
+ | Model | MMLU_PRO Tokens | AIME2024 Tokens | Speed Gain |
37
+ |--------------------------------|-----------------|-----------------|------------|
38
+ | DistilQwen2.5-R1-32B (Slow-Thinking) | 4198 | 12178 | 1x |
39
+ | DistilQwen2.5-DS3-0324-32B | 690 | 4177 | 5-8x |
40
+
41
+ ## Technical Advantages
42
+ - **Two-Stage Distillation**: First compresses reasoning length, then aligns cognitive trajectories
43
+ - **Dynamic Data Optimization**: Adaptive difficulty adjustment ensures knowledge transferability
44
+ - **Open-Source Compatibility**: Fine-tuned based on the Qwen2.5 base model
45
+
46
+ ## 🚀 Quick Start
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+ device = "cuda" # the device to load the model onto
51
+
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ "alibaba-pai/DistilQwen2.5-DS3-0324-32B",
54
+ torch_dtype="auto",
55
+ device_map="auto"
56
+ )
57
+ tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2.5-DS3-0324-32B")
58
+
59
+ prompt = "Give me a short introduction to large language model."
60
+ messages=[
61
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You should think step-by-step."},
62
+ {"role": "user", "content": prompt},
63
+ ]
64
+ text = tokenizer.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
70
+
71
+ generated_ids = model.generate(
72
+ model_inputs.input_ids,
73
+ max_new_tokens=2048,
74
+ )
75
+ generated_ids = [
76
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
77
+ ]
78
+
79
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
80
+
81
+ ```