pittawat commited on
Commit
4a9d791
·
verified ·
1 Parent(s): d155d4b

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,284 +1,36 @@
1
  ---
2
  license: gemma
3
  pipeline_tag: text-generation
 
 
 
 
4
  ---
5
 
6
- **Typhoon2.1-Gemma3-12B**: Thai Large Language Model (Instruct)
7
 
8
- **Typhoon2.1-Gemma3-12B** is a instruct Thai 🇹🇭 large language model with 12 billion parameters, a 128K context length, and function-calling capabilities. It is based on Gemma3 12B.
 
 
9
 
10
- Remark: This is text only model. We removed vision encoder for this version due to complexity. Stay-tune for version with vision encoder soon.
11
-
12
- ## **Performance**
13
-
14
- ![12b model performance](https://storage.googleapis.com/typhoon-public/assets/typhoon-21/performance12b_table.png)
15
-
16
- ## **Model Description**
17
-
18
- - **Model type**: A 12B instruct decoder-only model based on Gemma3 architecture.
19
- - **Requirement**: transformers 4.50.0 or newer.
20
- - **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
21
- - **Context Length**: 128K
22
- - **License**: [Gemma License](https://github.com/google-deepmind/gemma/blob/main/LICENSE)
23
-
24
-
25
- ## Usage Example
26
-
27
- This code snippet shows how to use the Typhoon2.1-Gemma3-12B model for Thai or English text generation using the transformers library. It includes setting up the model and tokenizer, formatting chat messages in a system-user style, and generating a response.
28
-
29
- ```python
30
- from transformers import AutoTokenizer, AutoModelForCausalLM
31
- import torch
32
-
33
- model_id = "scb10x/typhoon2.1-gemma3-12b"
34
-
35
- tokenizer = AutoTokenizer.from_pretrained(model_id)
36
- model = AutoModelForCausalLM.from_pretrained(
37
- model_id,
38
- torch_dtype=torch.bfloat16,
39
- device_map="auto",
40
- )
41
-
42
- messages = [
43
- {"role": "system", "content": "You are a male AI assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest. Typhoon is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks. Typhoon responds directly to all human messages without unnecessary affirmations or filler phrases like “Certainly!”, “Of course!”, “Absolutely!”, “Great!”, “Sure!”, etc. Specifically, Typhoon avoids starting responses with the word “Certainly” in any way. Typhoon follows this information in all languages, and always responds to the user in the language they use or request. Typhoon is now being connected with a human. Write in fluid, conversational prose, Show genuine interest in understanding requests, Express appropriate emotions and empathy. Also showing information in term that is easy to understand and visualized."},
44
- {"role": "user", "content": "ขอสูตรไก่ย่าง"},
45
- ]
46
-
47
- input_ids = tokenizer.apply_chat_template(
48
- messages,
49
- add_generation_prompt=True,
50
- return_tensors="pt",
51
- enable_thinking=False # Switches between thinking and non-thinking modes. Default is False.
52
- ).to(model.device)
53
-
54
- outputs = model.generate(
55
- input_ids,
56
- max_new_tokens=512,
57
- do_sample=True,
58
- temperature=0.6,
59
- top_p=0.95,
60
- )
61
- response = outputs[0][input_ids.shape[-1]:]
62
- print(tokenizer.decode(response, skip_special_tokens=True))
63
- ```
64
-
65
- ## Deploy as Server
66
-
67
- This section shows how to run Typhoon2.1 as an OpenAI-compatible API server using vllm.
68
 
69
  ```bash
70
- pip install vllm
71
- vllm serve scb10x/typhoon2.1-gemma3-12b --max-model-len 16000 --dtype bfloat16 --tool-call-parser pythonic --enable-auto-tool-choice
72
- # adjust --max-model-len based on your avaliable memory
73
- # you can use --quantization bitsandbytes to reduce the memory use while trade-off inference speed
74
- ```
75
-
76
- ## Using Tools
77
-
78
- You can provide tools to the vLLM-powered OpenAI-compatible API for functionality.
79
-
80
  ```
81
- from openai import OpenAI
82
- import json
83
-
84
- client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
85
-
86
- def get_weather(location: str, unit: str):
87
- return f"Getting the weather for {location} in {unit}..."
88
- tool_functions = {"get_weather": get_weather}
89
-
90
- tools = [{
91
- "type": "function",
92
- "function": {
93
- "name": "get_weather",
94
- "description": "Get the current weather in a given location",
95
- "parameters": {
96
- "type": "object",
97
- "properties": {
98
- "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
99
- "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
100
- },
101
- "required": ["location", "unit"]
102
- }
103
- }
104
- }]
105
-
106
- response = client.chat.completions.create(
107
- model=client.models.list().data[0].id,
108
- messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
109
- tools=tools,
110
- tool_choice="auto"
111
- )
112
-
113
- tool_call = response.choices[0].message.tool_calls[0].function
114
- print(f"Function called: {tool_call.name}")
115
- print(f"Arguments: {tool_call.arguments}")
116
- print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")
117
- ```
118
-
119
-
120
- ## Switching Between Thinking and Non-Thinking Mode
121
-
122
- Typhoon supports two modes:
123
- Non-thinking mode (default): Fast response generation without extra reasoning steps.
124
- Thinking mode: The model first reasons internally, then provides a clearer and potentially more accurate final answer.
125
- You can enable thinking mode by:
126
- Setting enable_thinking=True in apply_chat_template.
127
- Using a special system prompt that instructs the model to reason inside <think>...</think> tags.
128
-
129
- You can turn on thinking mode by either
130
- - add enable_thinking=True to apply_chat_template
131
 
132
  ```python
133
- input_ids = tokenizer.apply_chat_template(
134
- messages,
135
- add_generation_prompt=True,
136
- return_tensors="pt",
137
- enable_thinking=True # Switches between thinking and non-thinking modes. Default is False.
138
- ).to(model.device)
139
- ```
140
-
141
- - manually by supply thinking mode system prompt
142
-
143
- ```
144
- You are a helpful assistant. First, think through the reasoning internally, then present the reasoning within <think>...</think>. After thinking, clearly state a response that addresses the user's request and aligns with their preferences, not just providing a direct answer.
145
- ```
146
-
147
- - in vllm powered openai compatible client you can add chat_template_kwargs to the post payload
148
- ```json
149
- {
150
- "model": "scb10x/typhoon2.1-gemma3-12b",
151
- "messages": [
152
- {"role": "user", "content": "Give me a short introduction to large language models."}
153
- ],
154
- "chat_template_kwargs": {"enable_thinking": true}
155
- }
156
- ```
157
-
158
- ## Budget forcing
159
-
160
- This section introduces budget forcing, an advanced technique to let the model spend more time and tokens reasoning before producing a final answer—great for improving performance on complex questions.
161
-
162
- ```
163
- from vllm import LLM, SamplingParams
164
- from transformers import AutoTokenizer
165
- class BudgetForcingHandler:
166
-
167
- def __init__(self, model_name: str, max_think_token: int, max_ignore=5, temperature=0.6, seed=32):
168
- self.temperature = temperature
169
- self.seed = seed
170
- self.max_think_token = max_think_token
171
- self.max_ignore = max_ignore
172
- self.model = LLM(model_name, dtype='bfloat16', enforce_eager=True)
173
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
174
- self.alternative_str = '\nAlternatively'
175
- self.system = """You are a reasoning assistant. First, think through the reasoning internally, then present the reasoning within <think>...</think>. After thinking, clearly state the final answer."""
176
-
177
- def __call__(self, prompts: List[str]):
178
- count_prompt = len(prompts)
179
- prompts = [self.tokenizer.apply_chat_template([{'role': 'system', 'content': self.system}, {'role': 'user', 'content': f'Please solve this math question, and put your final answer within \\boxed{{}}.\n{p}'}], add_generation_prompt=True, tokenize=False) for p in prompts]
180
- sampling_params = SamplingParams(
181
- max_tokens=self.max_think_token,
182
- seed=self.seed,
183
- stop=["</think>"],
184
- skip_special_tokens=False,
185
- temperature=self.temperature,
186
- )
187
- o = self.model.generate(
188
- prompts,
189
- sampling_params=sampling_params
190
- )
191
-
192
- outputs = [output.outputs[0].text for output in o]
193
- token_count = [len(output.outputs[0].token_ids) for output in o]
194
- for i in range(len(prompts)):
195
- prompts[i] = prompts[i] + outputs[i]
196
-
197
- for _ in range(self.max_ignore): # Num of times to skip stop token
198
- inference_loop_prompts = []
199
- inference_idx = []
200
- max_inference_token = 0
201
- print('current token count: ', token_count)
202
- for i in range(len(prompts)):
203
- left_budget = self.max_think_token - token_count[i]
204
- if left_budget > 0:
205
- prompts[i] = prompts[i] + self.alternative_str
206
- inference_loop_prompts.append(prompts[i])
207
- inference_idx.append(i)
208
- if left_budget > max_inference_token:
209
- max_inference_token = left_budget
210
-
211
- outputs = ['' for _ in range(len(prompts))]
212
- if max_inference_token == 0 or len(inference_loop_prompts) == 0:
213
- break
214
- sampling_params = SamplingParams(
215
- max_tokens=max_inference_token,
216
- min_tokens=1,
217
- seed=self.seed,
218
- stop=["</think>"],
219
- skip_special_tokens=False,
220
- temperature=self.temperature,
221
- )
222
- o = self.model.generate(
223
- inference_loop_prompts,
224
- sampling_params=sampling_params
225
- )
226
- assert len(inference_idx) == len(inference_loop_prompts)
227
- assert len(inference_idx) == len(o)
228
- for i, output in zip(inference_idx, o):
229
- outputs[i] = output.outputs[0].text
230
-
231
- for i, idx in enumerate(inference_idx):
232
- token_count[idx] = token_count[idx] + len(o[i].outputs[0].token_ids)
233
-
234
- for i in range(len(prompts)):
235
- prompts[i] = prompts[i] + outputs[i]
236
- print('generating answer...')
237
- prompts = [p + '\nTime\'s up. End of thinking process. Will answer immediately.\n</think>' for i, p in enumerate(prompts)]
238
- sampling_params = SamplingParams(
239
- max_tokens=2048,
240
- min_tokens=0,
241
- seed=self.seed,
242
- skip_special_tokens=False,
243
- temperature=self.temperature,
244
- )
245
- o = self.model.generate(
246
- prompts,
247
- sampling_params=sampling_params,
248
- )
249
- for i in range(len(prompts)):
250
- prompts[i] = prompts[i] + o[i].outputs[0].text
251
- assert len(prompts) == count_prompt
252
- return prompts
253
-
254
- handler = BudgetForcingHandler("scb10x/typhoon2.1-gemma3-12b", max_think_token=2048)
255
- handler(["How many r in raspberry?"])
256
- ```
257
-
258
- ## **Intended Uses & Limitations**
259
-
260
- This model is an instructional model. However, it’s still undergoing development. It incorporates some level of guardrails, but it still may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.
261
-
262
- ## **Follow us**
263
-
264
- **https://twitter.com/opentyphoon**
265
-
266
- ## **Support**
267
 
268
- **https://discord.gg/us5gAYmrxw**
269
 
 
270
 
271
- ## **Citation**
 
 
 
 
272
 
273
- - If you find Typhoon2 useful for your work, please cite it using:
274
  ```
275
- @misc{typhoon2,
276
- title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
277
- author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
278
- year={2024},
279
- eprint={2412.13702},
280
- archivePrefix={arXiv},
281
- primaryClass={cs.CL},
282
- url={https://arxiv.org/abs/2412.13702},
283
- }
284
- ```
 
1
  ---
2
  license: gemma
3
  pipeline_tag: text-generation
4
+ base_model: scb10x/typhoon2.1-gemma3-12b
5
+ library_name: mlx
6
+ tags:
7
+ - mlx
8
  ---
9
 
10
+ # scb10x/typhoon2.1-gemma3-12b-mlx-4bit
11
 
12
+ This model [scb10x/typhoon2.1-gemma3-12b-mlx-4bit](https://huggingface.co/scb10x/typhoon2.1-gemma3-12b-mlx-4bit) was
13
+ converted to MLX format from [scb10x/typhoon2.1-gemma3-12b](https://huggingface.co/scb10x/typhoon2.1-gemma3-12b)
14
+ using mlx-lm version **0.25.2**.
15
 
16
+ ## Use with mlx
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ```bash
19
+ pip install mlx-lm
 
 
 
 
 
 
 
 
 
20
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```python
23
+ from mlx_lm import load, generate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
+ model, tokenizer = load("scb10x/typhoon2.1-gemma3-12b-mlx-4bit")
26
 
27
+ prompt = "hello"
28
 
29
+ if tokenizer.chat_template is not None:
30
+ messages = [{"role": "user", "content": prompt}]
31
+ prompt = tokenizer.apply_chat_template(
32
+ messages, add_generation_prompt=True
33
+ )
34
 
35
+ response = generate(model, tokenizer, prompt=prompt, verbose=True)
36
  ```
 
 
 
 
 
 
 
 
 
 
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
chat_template.jinja ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#- Begin-of-sequence token to start the model prompt -#}
2
+ {{ bos_token }}
3
+ {#- Extracts the system message. Gemma does not support system messages so it will be prepended to first user message. -#}
4
+ {%- if messages[0]['role'] == 'system' -%}
5
+ {%- if messages[0]['content'] is string -%}
6
+ {%- set first_user_prefix = messages[0]['content'] -%}
7
+ {%- else -%}
8
+ {%- set first_user_prefix = messages[0]['content'][0]['text'] -%}
9
+ {%- endif -%}
10
+ {%- set loop_messages = messages[1:] -%}
11
+ {%- else -%}
12
+ {%- set first_user_prefix = "You are a helpful assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest." -%}
13
+ {%- set loop_messages = messages -%}
14
+ {%- endif -%}
15
+ {%- if enable_thinking is defined and enable_thinking is true %}
16
+ {%- set first_user_prefix = first_user_prefix + " First, think through the reasoning internally, then present the reasoning within <think>...</think>. After thinking, clearly state a response that addresses the user's request and aligns with their preferences, not just providing a direct answer." -%}
17
+ {%- endif %}
18
+ {%- set first_user_prefix = first_user_prefix + '\n\n' -%}
19
+ {#- Set tools to none if not defined for this ChatCompletion request (helps avoid errors later) -#}
20
+ {%- if not tools is defined %}
21
+ {%- set tools = none %}
22
+ {%- endif %}
23
+
24
+ {#- If given only system message -#}
25
+ {%- if loop_messages|length == 0 -%}
26
+ {{ '<start_of_turn>user\n' -}}
27
+ {{ first_user_prefix }}
28
+ {#- Append system message with tool information if using tools in message request. -#}
29
+ {%- if tools is not none -%}
30
+ {{- "Tools (functions) are available. If you decide to invoke one or more of the tools, you must respond with a python list of the function calls.\n" -}}
31
+ {{- "Example Format: [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] \n" -}}
32
+ {{- "Do not use variables. DO NOT USE MARKDOWN SYNTAX. You SHOULD NOT include any other text in the response if you call a function. If none of the functions can be used, point it out. If you lack the parameters required by the function, also point it out.\n" -}}
33
+ {{- "Here is a list of functions in JSON format that you can invoke.\n" -}}
34
+ {{- tools | tojson(indent=4) -}}
35
+ {{- "\n\n" -}}
36
+ {%- endif -%}
37
+ {%- endif %}
38
+
39
+ {#- Main loop over all messages in the conversation history -#}
40
+ {%- for message in loop_messages -%}
41
+ {#- Normalize roles for model prompt formatting -#}
42
+ {%- if (message['role'] == 'assistant') -%}
43
+ {%- set role = "model" -%}
44
+ {%- elif (message['role'] == 'tool') -%}
45
+ {%- set role = "user" -%}
46
+ {%- else -%}
47
+ {%- set role = message['role'] -%}
48
+ {%- endif -%}
49
+ {#- Mark the start of a message block with the appropriate role -#}
50
+ {{ '<start_of_turn>' + role + '\n' -}}
51
+
52
+ {#- Insert system message content (if present) at the beginning of the first message. -#}
53
+ {%- if loop.first -%}
54
+ {{ first_user_prefix }}
55
+ {#- Append system message with tool information if using tools in message request. -#}
56
+ {%- if tools is not none -%}
57
+ {{- "Tools (functions) are available. If you decide to invoke one or more of the tools, you must respond with a python list of the function calls.\n" -}}
58
+ {{- "Example Format: [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] \n" -}}
59
+ {{- "Do not use variables. DO NOT USE MARKDOWN SYNTAX. You SHOULD NOT include any other text in the response if you call a function. If none of the functions can be used, point it out. If you lack the parameters required by the function, also point it out.\n" -}}
60
+ {{- "Here is a list of functions in JSON format that you can invoke.\n" -}}
61
+ {{- tools | tojson(indent=4) -}}
62
+ {{- "\n\n" -}}
63
+ {%- endif -%}
64
+ {%- endif -%}
65
+
66
+ {#- Format model tool calls (turns where model indicates they want to call a tool) -#}
67
+ {%- if 'tool_calls' in message -%}
68
+ {#- Opening bracket for tool call list. -#}
69
+ {{- '[' -}}
70
+ {#- For each tool call -#}
71
+ {%- for tool_call in message.tool_calls -%}
72
+ {#- Function name & opening parenthesis. -#}
73
+ {%- if tool_call.function is defined -%}
74
+ {%- set tool_call = tool_call.function -%}
75
+ {%- endif -%}
76
+ {{- tool_call.name + '(' -}}
77
+
78
+ {#-- Handle arguments as list (positional) or dict (named) --#}
79
+ {#-- Named arguments (dict) --#}
80
+ {%- if tool_call.arguments is iterable and tool_call.arguments is mapping -%}
81
+ {%- set first = true -%}
82
+ {%- for key, val in tool_call.arguments.items() -%}
83
+ {%- if not first %}, {% endif -%}
84
+ {{ key }}={{ val | tojson }}
85
+ {%- set first = false -%}
86
+ {%- endfor -%}
87
+ {#-- Positional arguments (list) --#}
88
+ {%- elif tool_call.arguments is iterable -%}
89
+ {{- tool_call.arguments | map('tojson') | join(', ') -}}
90
+ {#-- Fallback: single positional value --#}
91
+ {%- else -%}
92
+ {{- tool_call.arguments | tojson -}}
93
+ {#-- Closing parenthesis. --#}
94
+ {%- endif -%}
95
+ {{- ')' -}}
96
+ {#-- If more than one tool call, place comma and move to formatting next tool call --#}
97
+ {%- if not loop.last -%}, {% endif -%}
98
+ {%- endfor -%}
99
+ {#- Closing bracket for tool call list. -#}
100
+ {{- ']' -}}
101
+ {%- endif -%}
102
+
103
+ {#- Tool response start tag (for messages from a tool) -#}
104
+ {%- if (message['role'] == 'tool') -%}
105
+ {{ '<tool_response>\n' -}}
106
+ {%- endif -%}
107
+
108
+ {#- Render the message content: handle plain string or multimodal content like image/text -#}
109
+ {%- if message['content'] is string -%}
110
+ {%- set content = message['content'] -%}
111
+ {%- if '</think>' in content -%}
112
+ {%- set content = content.split('</think>')[-1] -%}
113
+ {%- endif -%}
114
+ {{ content | trim }}
115
+ {%- elif message['content'] is iterable -%}
116
+ {%- for item in message['content'] -%}
117
+ {%- if item['type'] == 'image' -%}
118
+ {{ '<start_of_image>' }}
119
+ {%- elif item['type'] == 'text' -%}
120
+ {%- set content = item['text'] -%}
121
+ {%- if '</think>' in content -%}
122
+ {%- set content = content.split('</think>')[-1] -%}
123
+ {%- endif -%}
124
+ {{ content | trim }}
125
+ {%- endif -%}
126
+ {%- endfor -%}
127
+ {%- else -%}
128
+ {{ raise_exception("Invalid content type") }}
129
+ {%- endif -%}
130
+
131
+ {#- Tool response end tag -#}
132
+ {%- if (message['role'] == 'tool') -%}
133
+ {{ '</tool_response>' -}}
134
+ {%- endif -%}
135
+
136
+ {#- Mark end of a single turn -#}
137
+ {{ '<end_of_turn>\n' }}
138
+ {%- endfor -%}
139
+
140
+ {#- If generation is to be triggered, add model prompt prefix -#}
141
+ {%- if add_generation_prompt -%}
142
+ {{'<start_of_turn>model\n'}}
143
+ {%- if enable_thinking is defined and enable_thinking is true -%}
144
+ {{- '<think>' -}}
145
+ {%- endif %}
146
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Gemma3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "attn_logit_softcapping": null,
8
+ "bos_token_id": 2,
9
+ "cache_implementation": "hybrid",
10
+ "eos_token_id": 1,
11
+ "final_logit_softcapping": null,
12
+ "head_dim": 256,
13
+ "hidden_activation": "gelu_pytorch_tanh",
14
+ "hidden_size": 3840,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 15360,
17
+ "max_position_embeddings": 131072,
18
+ "model_type": "gemma3_text",
19
+ "num_attention_heads": 16,
20
+ "num_hidden_layers": 48,
21
+ "num_key_value_heads": 8,
22
+ "pad_token_id": 0,
23
+ "quantization": {
24
+ "group_size": 64,
25
+ "bits": 4
26
+ },
27
+ "quantization_config": {
28
+ "group_size": 64,
29
+ "bits": 4
30
+ },
31
+ "query_pre_attn_scalar": 256,
32
+ "rms_norm_eps": 1e-06,
33
+ "rope_local_base_freq": 10000.0,
34
+ "rope_scaling": {
35
+ "factor": 8.0,
36
+ "rope_type": "linear"
37
+ },
38
+ "rope_theta": 1000000.0,
39
+ "sliding_window": 1024,
40
+ "sliding_window_pattern": 6,
41
+ "torch_dtype": "bfloat16",
42
+ "transformers_version": "4.50.0.dev0",
43
+ "use_cache": false,
44
+ "vocab_size": 262145
45
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "cache_implementation": "hybrid",
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 106
8
+ ],
9
+ "pad_token_id": 0,
10
+ "top_k": 64,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.50.0.dev0"
13
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82e76df80a7af51acb31081635a3ad154267585b3bff306b5f854491f4156893
3
+ size 5367250560
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62e9dd9bb0ce30408a5cf16afe9d39e2c1baeada788756015818c8a2f396b260
3
+ size 1818490740
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<end_of_turn>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a872e3bb510a751b26bd65f61aad05f948c9cf78fe4f787aebd197b393cc4081
3
+ size 33384667
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff