Update README.md
Browse files
README.md
CHANGED
@@ -7,390 +7,197 @@ pipeline_tag: text-generation
|
|
7 |
library_name: transformers
|
8 |
---
|
9 |
|
10 |
-
#
|
11 |
-
|
12 |
-
## Introduction
|
13 |
-
|
14 |
-
The GLM family welcomes new members, the **GLM-4-32B-0414** series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we enhanced the model’s performance in instruction following, engineering code, and function calling, thus strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like GPT-4o and DeepSeek-V3-0324 (671B).
|
15 |
-
|
16 |
-
**GLM-Z1-32B-0414** is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking feedback, which enhances the model's general capabilities.
|
17 |
-
|
18 |
-
**GLM-Z1-Rumination-32B-0414** is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex tasks. The model shows significant improvements in research-style writing and complex tasks.
|
19 |
-
|
20 |
-
Finally, **GLM-Z1-9B-0414** is a surprise. We employed all the aforementioned techniques to train a small model (9B). GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment.
|
21 |
-
|
22 |
-
## Showcase
|
23 |
-
|
24 |
-
### Animation Generation
|
25 |
-
|
26 |
-
<table>
|
27 |
-
<tr>
|
28 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
29 |
-
GLM-Z1-32B-0414
|
30 |
-
</td>
|
31 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
32 |
-
GLM-4-32B-0414
|
33 |
-
</td>
|
34 |
-
</tr>
|
35 |
-
<tr>
|
36 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
37 |
-
<video src="https://github.com/user-attachments/assets/849ff9fd-b54d-4c74-9ee5-3412e1a09e32"
|
38 |
-
style="width: 400px; height: 300px; object-fit: contain;" autoplay loop muted playsinline></video>
|
39 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;">
|
40 |
-
write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically
|
41 |
-
</div>
|
42 |
-
</td>
|
43 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
44 |
-
<video src="https://github.com/user-attachments/assets/8dccdb9d-cc44-4732-b438-74a4e3cb9dfb"
|
45 |
-
style="width: 400px; height: 300px; object-fit: contain;" autoplay loop muted playsinline></video>
|
46 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;">
|
47 |
-
Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese)
|
48 |
-
</div>
|
49 |
-
</td>
|
50 |
-
</tr>
|
51 |
-
</table>
|
52 |
-
|
53 |
-
### Web Design
|
54 |
-
|
55 |
-
<table>
|
56 |
-
<tr>
|
57 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
58 |
-
GLM-4-32B-0414
|
59 |
-
</td>
|
60 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
61 |
-
GLM-4-32B-0414
|
62 |
-
</td>
|
63 |
-
</tr>
|
64 |
-
<tr>
|
65 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
66 |
-
<img src="https://github.com/user-attachments/assets/bd9c1fc1-c784-4e8f-9c76-5f7389a715f1"/>
|
67 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;">
|
68 |
-
Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese)
|
69 |
-
</div>
|
70 |
-
</td>
|
71 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
72 |
-
<img src="https://github.com/user-attachments/assets/7ad12d52-9229-4278-8d1b-ffbf43e99070"/>
|
73 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;"> Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese) </div>
|
74 |
-
</td>
|
75 |
-
</tr>
|
76 |
-
</table>
|
77 |
-
|
78 |
-
### SVG Generation
|
79 |
-
|
80 |
-
<table>
|
81 |
-
<tr>
|
82 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
83 |
-
GLM-4-32B-0414
|
84 |
-
</td>
|
85 |
-
<td style="text-align: center; font-size: 16px; font-weight: bold; padding: 10px; width: 420px;">
|
86 |
-
GLM-4-32B-0414
|
87 |
-
</td>
|
88 |
-
</tr>
|
89 |
-
<tr>
|
90 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
91 |
-
<img src="https://github.com/user-attachments/assets/9407e4c1-1876-4ab5-838c-839836fb418a"/>
|
92 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;">
|
93 |
-
Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese)
|
94 |
-
</div>
|
95 |
-
</td>
|
96 |
-
<td style="vertical-align: top; padding: 10px; width: 420px;">
|
97 |
-
<img src="https://github.com/user-attachments/assets/bcce8c5a-cedf-45c8-b666-ddb023d5b49c"/>
|
98 |
-
<div style="margin-top: 10px; font-size: 14px; color: #333; width: 400px;"> Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese) </div>
|
99 |
-
</td>
|
100 |
-
</tr>
|
101 |
-
</table>
|
102 |
-
|
103 |
|
104 |
-
|
105 |
|
106 |
-
|
107 |
|
108 |
-
|
109 |
-
请根据所给搜索返回结果对用户问题进行作答。
|
110 |
|
111 |
-
|
112 |
-
1. 充分利用和整理收集到的信息,而不是简单的复制粘贴,生成符合用户要求且有深度的专业答案。
|
113 |
-
2. 所提供信息充分的情况下,你的回答需尽可能延长,从用户意图角度出发,提供具有足够信息量和多角度的回复。
|
114 |
-
3. 另外,并非所有的搜索结果都与用户问题密切相关,请仔细的甄别、筛选和利用。
|
115 |
-
4. 客观类问答的答案通常非常简短,你可以适当补充一到两句相关信息,以丰富内容。
|
116 |
-
5. 请确保你的回复格式美观、可读性强。对于多实体对比或列举,善用列表格式来帮助用户更好的理解信息。
|
117 |
-
6. 除非用户要求,否则你回答的语言请于用户提问语言保持一致。
|
118 |
-
7. 在适当情况下在句子末尾使用例如:【0†source】的格式引用搜索结果。
|
119 |
-
```
|
120 |
|
121 |
-
|
122 |
|
123 |
-
```json
|
124 |
-
[
|
125 |
-
{
|
126 |
-
"role": "user",
|
127 |
-
"content": "Explore the common characteristics of children's literature, with a focus on its narrative techniques and thematic tendencies. This includes narrative techniques: common approaches in children's literature such as first-person, third-person, omniscient narrator, and interactive narration, and their influence on young readers. It also includes thematic tendencies: recurring themes in children's literature such as growth, adventure, friendship, and family, with an analysis of how these themes impact children's cognitive and emotional development. Additionally, other universal features such as the use of personification, repetitive language, symbolism and metaphor, and educational value should be considered. Please provide a detailed analytical report based on academic research, classic examples of children's literature, and expert opinions."
|
128 |
-
},
|
129 |
-
{
|
130 |
-
"role": "observation",
|
131 |
-
"content": "【{id}†{title}†{url}】\n{content}"
|
132 |
-
},
|
133 |
-
...
|
134 |
-
]
|
135 |
-
```
|
136 |
-
For the above prompt, we use an internal or external search model to obtain the search results. Using the format shown above, we can generate the following analysis report:
|
137 |
|
138 |
-
<div style="height: 400px; width: 100%; overflow: auto; border: 5px solid #ddd; padding: 20px;">
|
139 |
|
140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
141 |
|
142 |
-
|
143 |
|
144 |
-
|
145 |
|
146 |
-
|
|
|
|
|
147 |
|
148 |
-
|
149 |
|
150 |
-
|
151 |
|
152 |
-
|
153 |
|
154 |
-
|
155 |
|
156 |
-
|
157 |
|
158 |
-
|
159 |
-
|----------------------|--------------------|----------------------------|---------------------|
|
160 |
-
| **First-Person** | Told from the child protagonist's perspective | Enhances immersion, fosters empathy | *Charlotte's Web*, *The Straw House* |
|
161 |
-
| **Third-Person** | Omniscient or limited perspective | Expands horizons, develops comprehensive understanding | *Harry Potter* series |
|
162 |
-
| **Narration** | Direct authorial intrusion into the narrative | Aids comprehension, conveys values | *Andersen's Fairy Tales* |
|
163 |
-
| **Interactive** | Encourages reader participation in creation | Cultivates agency and creative thinking | Children's interactive reading apps |
|
164 |
|
165 |
-
|
166 |
|
167 |
-
|
168 |
|
169 |
-
|
170 |
|
171 |
-
|
172 |
|
173 |
-
|
174 |
|
175 |
-
|
176 |
|
177 |
-
|
178 |
|
179 |
-
|
180 |
|
181 |
-
|
182 |
-
|---------------|---------------------------|-------------------------------------|-------------------------------------|---------------------|
|
183 |
-
| **Growth** | Awakening of self-awareness, psychological trials and breakthroughs | Establishes self-concept, fosters problem-solving skills | Shapes positive self-identity, enhances psychological resilience | *The Straw House*, *Pippi Longstocking* |
|
184 |
-
| **Adventure** | Exploring the unknown, overcoming challenges | Expands imaginative space, exercises logical thinking | Cultivates courage and perseverance | *Robinson Crusoe*, *The Adventures of Tom Sawyer* |
|
185 |
-
| **Friendship** | Peer interactions, mutual aid and cooperation | Learns social skills, understands interpersonal dynamics | Develops empathy, builds a sense of belonging | *Charlotte's Web*, *Totto-Chan: The Little Girl at the Window* |
|
186 |
-
| **Family** | Parent-child relationships, sibling bonds | Understands social roles, learns communication skills | Gains emotional support, establishes secure attachments | *Alice's Adventures in Wonderland*, *The Little Prince* |
|
187 |
|
188 |
-
|
189 |
|
190 |
-
|
191 |
|
192 |
-
|
193 |
|
194 |
-
|
195 |
|
196 |
-
|
197 |
|
198 |
-
|
199 |
|
200 |
-
|
201 |
|
202 |
-
|
203 |
|
204 |
-
|
205 |
|
206 |
-
|
207 |
|
208 |
-
|
209 |
|
210 |
-
|
211 |
|
212 |
-
|
213 |
|
214 |
-
From an academic research perspective, children's literature studies should emphasize the application of narrative theory, as narrative theory focuses more on the "how" of storytelling—narrative form—which aligns closely with the research focus of children's literature【0†source】. At the same time, cognitive research methods provide new perspectives for children's literature studies. By combining cognitive science with literary theory, we can gain a deeper understanding of how children's literature influences children's thinking and cognitive development【4†source】. Future research should continue to explore the application of these theoretical methods in children's literature studies while paying attention to the intersection and integration of children's literature with emerging fields such as digital media and interdisciplinary education.
|
215 |
|
216 |
-
|
217 |
|
218 |
-
|
219 |
|
220 |
-
|
221 |
|
222 |
-
|
223 |
-
|
224 |
-
### Function Call
|
225 |
|
226 |
-
|
227 |
-
The message format for tool calling is as follows:
|
228 |
-
```json=
|
229 |
-
{
|
230 |
-
"role": "asssitant",
|
231 |
-
"metadata": function_name,
|
232 |
-
"content": json.dumps(call_arguments, ensure_ascii=False)
|
233 |
-
}
|
234 |
-
```
|
235 |
|
236 |
-
|
237 |
|
238 |
-
|
239 |
-
{
|
240 |
-
"role": "observation",
|
241 |
-
"content": json.dumps(tool_response, ensure_ascii=False) if not isinstance(tool_response, str) else tool_response
|
242 |
-
}
|
243 |
-
```
|
244 |
|
245 |
-
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
|
258 |
-
|
259 |
-
|
260 |
-
|
261 |
-
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
|
266 |
-
|
267 |
-
|
268 |
-
|
269 |
-
|
270 |
-
|
271 |
-
|
272 |
-
|
273 |
-
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
|
278 |
-
|
279 |
-
|
280 |
-
|
281 |
-
|
282 |
-
|
283 |
-
|
284 |
-
|
285 |
-
|
286 |
-
|
287 |
-
|
288 |
-
|
289 |
-
|
290 |
-
|
291 |
-
|
292 |
-
|
293 |
-
|
294 |
-
|
295 |
-
|
296 |
-
|
297 |
-
|
298 |
-
|
299 |
-
|
300 |
-
|
301 |
-
|
302 |
-
|
303 |
-
|
304 |
-
|
305 |
-
|
306 |
-
|
307 |
-
|
308 |
-
|
309 |
-
|
310 |
-
|
311 |
-
|
312 |
-
|
313 |
-
|
314 |
-
|
315 |
-
|
316 |
-
|
317 |
-
|
318 |
-
|
319 |
-
|
320 |
-
|
321 |
-
|
322 |
-
|
323 |
-
|
324 |
-
|
325 |
-
]
|
326 |
-
|
327 |
-
|
328 |
-
|
329 |
-
|
330 |
-
|
331 |
-
|
332 |
-
|
333 |
-
|
334 |
-
|
335 |
-
|
336 |
-
|
337 |
-
|
338 |
-
"attention_mask": inputs["attention_mask"],
|
339 |
-
"max_new_tokens": 1024,
|
340 |
-
"do_sample": True,
|
341 |
-
}
|
342 |
-
out = model.generate(**generate_kwargs)
|
343 |
-
generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
|
344 |
-
stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
|
345 |
-
if stop_sequence == "<|user|>":
|
346 |
-
print(f"Assistant Response: {generate_resp.strip()}")
|
347 |
-
break
|
348 |
-
|
349 |
-
function_calls = []
|
350 |
-
for m in generate_resp.split("<|assistant|>"):
|
351 |
-
fc_decode = is_function_call(m.strip())
|
352 |
-
if fc_decode:
|
353 |
-
message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
|
354 |
-
print(f"Function Call: {fc_decode}")
|
355 |
-
function_calls.append(fc_decode)
|
356 |
-
else:
|
357 |
-
message.append({"role": "assistant", "content": m})
|
358 |
-
print(f"Assistant Response: {m.strip()}")
|
359 |
-
|
360 |
-
for fc in function_calls:
|
361 |
-
function_response = realtime_aqi(
|
362 |
-
city=fc["arguments"]["city"],
|
363 |
-
)
|
364 |
-
print(f"Function Response: {function_response}")
|
365 |
-
message.append({"role": "observation", "content": function_response})
|
366 |
-
```
|
367 |
-
|
368 |
-
## Evaluation Results
|
369 |
-
|
370 |
-
<div style="text-align: center;">
|
371 |
-
<img src="https://raw.githubusercontent.com/THUDM/GLM-4/refs/heads/main/resources/Bench-32B.png" style="width: 80%;" />
|
372 |
-
</div>
|
373 |
-
|
374 |
-
### GLM-4-0414 Series
|
375 |
-
|
376 |
-
| 模型 | IFEval | BFCL-v3 (Overall) | BFCL-v3 (MultiTurn) | TAU-Bench (Retail) | TAU-Bench (Airline) | SimpleQA | HotpotQA |
|
377 |
-
| ---------------- | ------ | ----------------- | ------------------- | ------------------ | ------------------- | -------- | -------- |
|
378 |
-
| Qwen2.5-Max | 85.6 | 50.9 | 30.5 | 58.3 | 22.0 | 79.0 | 52.8 |
|
379 |
-
| GPT-4o-1120 | 81.9 | 69.6 | 41.0 | 62.8 | 46.0 | 82.8 | 63.9 |
|
380 |
-
| DeepSeek-V3-0324 | 83.4 | 66.2 | 35.8 | 60.7 | 32.4 | 82.6 | 54.6 |
|
381 |
-
| DeepSeek-R1 | 84.3 | 57.5 | 12.4 | 33.0 | 37.3 | 83.9 | 63.1 |
|
382 |
-
| GLM-4-32B-0414 | 87.6 | 69.6 | 41.5 | 68.7 | 51.2 | 88.1 | 63.8 |
|
383 |
-
|
384 |
-
> For `SimpleQA` and `HotpotQA`, we sampled nearly 500 test cases from each test set, provided all models with basic `search` and `click` tools, ensured other settings remained consistent, and averaged the results over 3 runs.
|
385 |
-
|
386 |
-
| Model | Framework | [SWE-bench Verified](https://openai.com/index/introducing-swe-bench-verified/) | [SWE-bench Verified mini](https://github.com/mariushobbhahn/SWEBench-verified-mini) |
|
387 |
-
|---|---|---|---|
|
388 |
-
| GLM-4-32B-0414 | Moatless<sup>[1]</sup> | 33.8 | 38.0 |
|
389 |
-
| GLM-4-32B-0414 | Agentless<sup>[2]</sup> | 30.7 | 34.0 |
|
390 |
-
| GLM-4-32B-0414 | OpenHands<sup>[3]</sup> | 27.2 | 28.0 |
|
391 |
-
|
392 |
-
[1] [Moatless v0.0.3](https://github.com/aorwall/moatless-tools) used the following parameters: `response_format="react", thoughts_in_action=False, max_interations=30`. No retries on failed trajectories; other settings are default.
|
393 |
-
|
394 |
-
[2] [Agentless v1.5.0](https://github.com/OpenAutoCoder/Agentless) used [BGE](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md) as the embedding model and [FAISS](https://github.com/facebookresearch/faiss) for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s.
|
395 |
-
|
396 |
-
[3] [OpenHands v0.29.1](https://github.com/All-Hands-AI/OpenHands/tree/main) did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llm_config="condenser", keep_first=1, max_size=32`. No retries on failed trajectories.
|
|
|
7 |
library_name: transformers
|
8 |
---
|
9 |
|
10 |
+
# Model Card for Model ID
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
|
14 |
+
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
|
15 |
|
16 |
+
## Model Details
|
|
|
17 |
|
18 |
+
### Model Description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
+
<!-- Provide a longer summary of what this model is. -->
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
|
|
23 |
|
24 |
+
- **Developed by:** [More Information Needed]
|
25 |
+
- **Funded by [optional]:** [More Information Needed]
|
26 |
+
- **Shared by [optional]:** [More Information Needed]
|
27 |
+
- **Model type:** [More Information Needed]
|
28 |
+
- **Language(s) (NLP):** [More Information Needed]
|
29 |
+
- **License:** [More Information Needed]
|
30 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
31 |
|
32 |
+
### Model Sources [optional]
|
33 |
|
34 |
+
<!-- Provide the basic links for the model. -->
|
35 |
|
36 |
+
- **Repository:** [More Information Needed]
|
37 |
+
- **Paper [optional]:** [More Information Needed]
|
38 |
+
- **Demo [optional]:** [More Information Needed]
|
39 |
|
40 |
+
## Uses
|
41 |
|
42 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
43 |
|
44 |
+
### Direct Use
|
45 |
|
46 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
47 |
|
48 |
+
[More Information Needed]
|
49 |
|
50 |
+
### Downstream Use [optional]
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
53 |
|
54 |
+
[More Information Needed]
|
55 |
|
56 |
+
### Out-of-Scope Use
|
57 |
|
58 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
59 |
|
60 |
+
[More Information Needed]
|
61 |
|
62 |
+
## Bias, Risks, and Limitations
|
63 |
|
64 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
65 |
|
66 |
+
[More Information Needed]
|
67 |
|
68 |
+
### Recommendations
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
71 |
|
72 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
73 |
|
74 |
+
## How to Get Started with the Model
|
75 |
|
76 |
+
Use the code below to get started with the model.
|
77 |
|
78 |
+
[More Information Needed]
|
79 |
|
80 |
+
## Training Details
|
81 |
|
82 |
+
### Training Data
|
83 |
|
84 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
85 |
|
86 |
+
[More Information Needed]
|
87 |
|
88 |
+
### Training Procedure
|
89 |
|
90 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
91 |
|
92 |
+
#### Preprocessing [optional]
|
93 |
|
94 |
+
[More Information Needed]
|
95 |
|
|
|
96 |
|
97 |
+
#### Training Hyperparameters
|
98 |
|
99 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
100 |
|
101 |
+
#### Speeds, Sizes, Times [optional]
|
102 |
|
103 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
|
|
104 |
|
105 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
+
## Evaluation
|
108 |
|
109 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
+
### Testing Data, Factors & Metrics
|
112 |
+
|
113 |
+
#### Testing Data
|
114 |
+
|
115 |
+
<!-- This should link to a Dataset Card if possible. -->
|
116 |
+
|
117 |
+
[More Information Needed]
|
118 |
+
|
119 |
+
#### Factors
|
120 |
+
|
121 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
122 |
+
|
123 |
+
[More Information Needed]
|
124 |
+
|
125 |
+
#### Metrics
|
126 |
+
|
127 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
128 |
+
|
129 |
+
[More Information Needed]
|
130 |
+
|
131 |
+
### Results
|
132 |
+
|
133 |
+
[More Information Needed]
|
134 |
+
|
135 |
+
#### Summary
|
136 |
+
|
137 |
+
|
138 |
+
|
139 |
+
## Model Examination [optional]
|
140 |
+
|
141 |
+
<!-- Relevant interpretability work for the model goes here -->
|
142 |
+
|
143 |
+
[More Information Needed]
|
144 |
+
|
145 |
+
## Environmental Impact
|
146 |
+
|
147 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
148 |
+
|
149 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
150 |
+
|
151 |
+
- **Hardware Type:** [More Information Needed]
|
152 |
+
- **Hours used:** [More Information Needed]
|
153 |
+
- **Cloud Provider:** [More Information Needed]
|
154 |
+
- **Compute Region:** [More Information Needed]
|
155 |
+
- **Carbon Emitted:** [More Information Needed]
|
156 |
+
|
157 |
+
## Technical Specifications [optional]
|
158 |
+
|
159 |
+
### Model Architecture and Objective
|
160 |
+
|
161 |
+
[More Information Needed]
|
162 |
+
|
163 |
+
### Compute Infrastructure
|
164 |
+
|
165 |
+
[More Information Needed]
|
166 |
+
|
167 |
+
#### Hardware
|
168 |
+
|
169 |
+
[More Information Needed]
|
170 |
+
|
171 |
+
#### Software
|
172 |
+
|
173 |
+
[More Information Needed]
|
174 |
+
|
175 |
+
## Citation [optional]
|
176 |
+
|
177 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
178 |
+
|
179 |
+
**BibTeX:**
|
180 |
+
|
181 |
+
[More Information Needed]
|
182 |
+
|
183 |
+
**APA:**
|
184 |
+
|
185 |
+
[More Information Needed]
|
186 |
+
|
187 |
+
## Glossary [optional]
|
188 |
+
|
189 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
190 |
+
|
191 |
+
[More Information Needed]
|
192 |
+
|
193 |
+
## More Information [optional]
|
194 |
+
|
195 |
+
[More Information Needed]
|
196 |
+
|
197 |
+
## Model Card Authors [optional]
|
198 |
+
|
199 |
+
[More Information Needed]
|
200 |
+
|
201 |
+
## Model Card Contact
|
202 |
+
|
203 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|