ertghiu256 commited on
Commit
bf49dda
·
verified ·
1 Parent(s): f7218dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md CHANGED
@@ -23,6 +23,102 @@ tags:
23
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
24
 
25
  ## Merge Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ### Merge Method
27
 
28
  This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) as a base.
 
23
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
24
 
25
  ## Merge Details
26
+ This model aims to combine the code and math capabilities by merging Qwen 3 2507 with multiple Qwen 3 finetunes.
27
+
28
+ # How to run
29
+ You can run this model by using multiple interface choices
30
+
31
+ ## transformers
32
+ As the qwen team suggested to use
33
+ ```python
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+
36
+ model_name = "ertghiu256/Qwen3-4b-tcomanr-merge-v2"
37
+
38
+ # load the tokenizer and the model
39
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
40
+ model = AutoModelForCausalLM.from_pretrained(
41
+ model_name,
42
+ torch_dtype="auto",
43
+ device_map="auto"
44
+ )
45
+
46
+ # prepare the model input
47
+ prompt = "Give me a short introduction to large language model."
48
+ messages = [
49
+ {"role": "user", "content": prompt}
50
+ ]
51
+ text = tokenizer.apply_chat_template(
52
+ messages,
53
+ tokenize=False,
54
+ add_generation_prompt=True,
55
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
56
+ )
57
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
58
+
59
+ # conduct text completion
60
+ generated_ids = model.generate(
61
+ **model_inputs,
62
+ max_new_tokens=32768
63
+ )
64
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
65
+
66
+ # parsing thinking content
67
+ try:
68
+ # rindex finding 151668 (</think>)
69
+ index = len(output_ids) - output_ids[::-1].index(151668)
70
+ except ValueError:
71
+ index = 0
72
+
73
+ thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
74
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
75
+
76
+ print("thinking content:", thinking_content)
77
+ print("content:", content)
78
+ ```
79
+
80
+ ## vllm
81
+ Run this command
82
+ ```bash
83
+ vllm serve ertghiu256/Qwen3-4b-tcomanr-merge-v2 --enable-reasoning --reasoning-parser deepseek_r1
84
+ ```
85
+
86
+ ## Sglang
87
+ Run this command
88
+ ```bash
89
+ python -m sglang.launch_server --model-path ertghiu256/Qwen3-4b-tcomanr-merge-v2 --reasoning-parser deepseek-r1
90
+ ```
91
+
92
+ ## llama.cpp
93
+ Run this command
94
+ ```bash
95
+ llama-server --hf-repo ertghiu256/Qwen3-4b-tcomanr-merge-v2
96
+ ```
97
+ or
98
+ ```bash
99
+ llama-cli --hf ertghiu256/Qwen3-4b-tcomanr-merge-v2
100
+ ```
101
+
102
+ ## ollama
103
+ Run this command
104
+ ```bash
105
+ ollama run hf.co/ertghiu256/Qwen3-4b-tcomanr-merge-v2:Q8_0
106
+ ```
107
+
108
+ ## lm studio
109
+ Search
110
+ ```
111
+ ertghiu256/Qwen3-4b-tcomanr-merge-v2
112
+ ```
113
+ in the lm studio model search list then download
114
+
115
+ ### Recomended parameters
116
+ ```
117
+ temp: 0.6
118
+ num_ctx: ≥8192
119
+ top_p: 0.95
120
+ top_k: 30
121
+ ```
122
  ### Merge Method
123
 
124
  This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) as a base.