YOYO-AI commited on
Commit
2ceb596
·
verified ·
1 Parent(s): 2fa4cb3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -1,3 +1,52 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
8
+ - AXCXEPT/Qwen3-EZO-8B-beta
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - merge
12
+ ---
13
+ > [!TIP]
14
+ > The Karcher merge method does not require the use of a base model. Click [here](https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#karcher-mean-karcher) for details.
15
+
16
+ # *Model Highlights:*
17
+
18
+ - ***merge method**: `karcher`*
19
+
20
+ - ***Highest precision**: `dtype: float32` + `out_dtype: bfloat16`*
21
+
22
+ - ***Brand-new chat template**: ensures normal operation on LM Studio*
23
+
24
+ - ***Context length**: `32768`*
25
+ ## *Model Selection Table:*
26
+ |Model|Context|Uses Basic Model|
27
+ |---|---|---|
28
+ |[Qwen3-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher)|32K|NO|
29
+ |[Qwen3-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher-128K)|128K|NO|
30
+ |[Qwen3-EZO-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher)|32K|NO|
31
+ |[Qwen3-EZO-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-128K)|128K|NO|
32
+ > **Warning**:
33
+ > *Models with `128K` context may have slight quality loss. In most cases, please use the `32K` native context!*
34
+ # *Parameter Settings*:
35
+ ## *Thinking Mode:*
36
+ > [!NOTE]
37
+ > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.*
38
+
39
+ # *Configuration*:
40
+ *The following YAML configuration was used to produce this model:*
41
+
42
+ ```yaml
43
+ models:
44
+ - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
45
+ - model: AXCXEPT/Qwen3-EZO-8B-beta
46
+ merge_method: karcher
47
+ parameters:
48
+ max_iter: 1000
49
+ dtype: float32
50
+ out_dtype: bfloat16
51
+ tokenizer_source: AXCXEPT/Qwen3-EZO-8B-beta
52
+ ```