DS-R1-0528-Qwen3-YOYO-merge
Collection
16 items
•
Updated
•
1
The Karcher merge method does not require the use of a base model. Click here for details.
merge method: karcher
Highest precision: dtype: float32
+ out_dtype: bfloat16
Brand-new chat template: ensures normal operation on LM Studio
Context length: 32768
Model | Context | Uses Basic Model |
---|---|---|
Qwen3-8B-YOYO-karcher | 32K | NO |
Qwen3-8B-YOYO-karcher-128K | 128K | NO |
Qwen3-EZO-8B-YOYO-karcher | 32K | NO |
Qwen3-EZO-8B-YOYO-karcher-128K | 128K | NO |
Warning: Models with
128K
context may have slight quality loss. In most cases, please use the32K
native context!
Temperature=0.6
,TopP=0.95
,TopK=20
,MinP=0
.
The following YAML configuration was used to produce this model:
models:
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
- model: AXCXEPT/Qwen3-EZO-8B-beta
merge_method: karcher
parameters:
max_iter: 1000
dtype: float32
out_dtype: bfloat16
tokenizer_source: AXCXEPT/Qwen3-EZO-8B-beta