Upload 15 files

Browse files

Files changed (15) hide show

.gitattributes +8 -0
README.md +161 -0
added_tokens.json +24 -0
config.json +39 -0
generation_config.json +11 -0
merges.txt +0 -0
model-00001-of-00003.safetensors +3 -0
model-00002-of-00003.safetensors +3 -0
model-00003-of-00003.safetensors +3 -0
model.safetensors.index.json +0 -0
quantization_config.json +0 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+assets/ball.gif filter=lfs diff=lfs merge=lfs -text
+assets/benchmark.png filter=lfs diff=lfs merge=lfs -text
+assets/count.png filter=lfs diff=lfs merge=lfs -text
+assets/diamond.png filter=lfs diff=lfs merge=lfs -text
+assets/param-aime2024.jpeg filter=lfs diff=lfs merge=lfs -text
+assets/param-lcb.jpeg filter=lfs diff=lfs merge=lfs -text
+assets/writing.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,161 @@

+---
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+base_model:
+- a-m-team/AM-Thinking-v1
+base_model_relation: quantized
+---
+## Quantized using the default exllamav3 (0.0.4) quantization process.
+- Original model: https://huggingface.co/a-m-team/AM-Thinking-v1
+- exllamav3: https://github.com/turboderp-org/exllamav3
+---
+# AM‑Thinking‑v1: Advancing the Frontier of Reasoning at 32B Scale
+* 2025-05-10 · a-m‑team
+<p align="center">
+🤗 <a href="https://huggingface.co/a-m-team">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.08311"> Paper</a>  &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://a-m-team.github.io/am-thinking-v1/">Blog</a> &nbsp&nbsp
+</p>
+## 🚀 Introduction
+We release **AM-Thinking‑v1**, a 32B dense language model focused on enhancing reasoning capabilities.
+Built on Qwen 2.5‑32B‑Base, AM-Thinking‑v1 shows strong performance on reasoning benchmarks, comparable to much larger MoE models like **DeepSeek‑R1**, **Qwen3‑235B‑A22B**, **Seed1.5-Thinking**, and larger dense model like **Nemotron-Ultra-253B-v1**.
+<div style="text-align: center;">
+    <img src="assets/benchmark.png" alt="benchmark" style="width: 90%;">
+</div>
+## 🧩 Why Another 32B Reasoning Model Matters?
+Large Mixture‑of‑Experts (MoE) models such as **DeepSeek‑R1** or **Qwen3‑235B‑A22B** dominate leaderboards—but they also demand clusters of high‑end GPUs. Many teams just need *the best dense model that fits on a single card*.
+**AM‑Thinking‑v1** fills that gap **while remaining fully based on open-source components**:
+* **Outperforms DeepSeek‑R1** on AIME’24/’25 & LiveCodeBench and **approaches Qwen3‑235B‑A22B** despite being 1/7‑th the parameter count.
+* **Built on the publicly available Qwen 2.5‑32B‑Base**, as well as the RL training queries.
+* Shows that with a **well‑designed post‑training pipeline** ( SFT + dual‑stage RL ) you can squeeze flagship‑level reasoning out of a 32 B dense model.
+* **Deploys on one A100‑80 GB** with deterministic latency—no MoE routing overhead.
+<div style="text-align: center;">
+  <img src="assets/param-aime2024.jpeg" alt="AIME 2024" style="width: 90%; margin-bottom: 20px;">
+  <img src="assets/param-lcb.jpeg" alt="LiveCodeBench" style="width: 90%;">
+  <div style="margin-top: 10px;">
+    <em>AM-Thinking-v1 achieves strong reasoning performance with significantly fewer parameters.</em>
+  </div>
+</div>
+## 🛠️ Use Cases
+### 1) Code Generation
+<pre style="font-family: 'Times New Roman', serif; font-size: 12px; border: 1px solid black; padding: 10px; font-style: italic;">
+PROMPT :
+write a python script for a bouncing red ball within a triangle, make sure to handle collision detection properly. make the triangle slowly rotate. implement it in python. make sure ball stays within the triangle
+</pre>
+<div style="text-align: center;">
+    <img src="assets/ball.gif" alt="Bouncing Red Ball" width="50%">
+</div>
+### 2) Logic
+<div style="text-align: center;">
+    <img src="assets/diamond.png" alt="diamond" width="90%">
+</div>
+### 3) Writing
+<div style="text-align: center;">
+    <img src="assets/writing.png" alt="sushi" width="90%">
+</div>
+## ⚡ Quick start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "a-m-team/AM-Thinking-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+prompt = "How can I find inner peace?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=49152
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+response = tokenizer.decode(output_ids, skip_special_tokens=True)
+think_content = response.split("<think>")[1].split("</think>")[0]
+answer_content = response.split("<answer>")[1].split("</answer>")[0]
+print (f"user prompt: {prompt}")
+print (f"model thinking: {think_content}")
+print (f"model answer: {answer_content}")
+```
+> Note: We have included the system prompt in the tokenizer configuration, as it was used during both the SFT and RL stages. To ensure consistent output quality, we recommend including the same system prompt during actual usage; otherwise, the model's responses may be significantly affected.
+### Quantized versions for compact devices
+A series of quantized versions for [AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1-gguf) model.
+For use with [llama.cpp](https://github.com/ggml-org/llama.cpp) and [Ollama](https://github.com/ollama/ollama)
+is available at [AM-Thinking-v1-gguf](https://huggingface.co/a-m-team/AM-Thinking-v1-gguf).
+## 🔧 Post-training pipeline
+To achieve its strong reasoning ability, AM‑Thinking‑v1 goes through a carefully designed post-training pipeline.
+Below we describe the key stages involved in turning a base model into a high-performing reasoner:
+**Step 1 – Cold‑start SFT.**
+We begin with the open-sourced **Qwen 2.5‑32B‑Base** and run a broad supervised fine‑tune on a blended training dataset of math, code and open‑domain chat. This endows the model with a "think‑then‑answer" behavioural pattern and equips it with an initial capacity for reasoning.
+**Step 2 – Pass‑rate‑aware data curation.**
+Before any RL, the SFT model is evaluated on every math‑ and code‑oriented training query. For each item we log a pass rate; only those with **0 < pass‑rate < 1** are kept. In effect we discard problems the model already masters and those it utterly fails, concentrating learning on genuinely informative cases.
+**Step 3 – Reinforcement learning .**
+We adopt a two‑stage GRPO scheme: Stage 1 trains only on math and code queries. Once it converges, stage 2 starts by removing every query the model answered 100% correctly in Stage 1 and adjusting key hyper‑parameters such as maximum generation length and learning rate.
+## ⚠️ Limitations
+While AM‑Thinking‑v1 excels at pure language reasoning and open‑domain chat, it has not yet been trained for structured function‑calling or tool‑use workflows, which restricts its usefulness in agent‑style applications that must act on external systems.
+Improving the model's ability to follow complex instructions is also an important direction for our future work.
+In addition, our safety alignment is still at an early stage, so more rigorous red‑teaming are required to reduce potential harms.
+## 📚 Citation
+The a-m-team is an internal team at Beike (Ke.com), dedicated to exploring AGI technology.
+If you find our work helpful, feel free to give us a cite.
+```
+@misc{ji2025amthinkingv1advancingfrontierreasoning,
+      title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale},
+      author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li},
+      year={2025},
+      eprint={2505.08311},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2505.08311},
+}
+```

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+    "architectures": [
+        "Qwen2ForCausalLM"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": 151643,
+    "eos_token_id": 151643,
+    "hidden_act": "silu",
+    "hidden_size": 5120,
+    "initializer_range": 0.02,
+    "intermediate_size": 27648,
+    "max_position_embeddings": 131072,
+    "max_window_layers": 64,
+    "model_type": "qwen2",
+    "num_attention_heads": 40,
+    "num_hidden_layers": 64,
+    "num_key_value_heads": 8,
+    "rms_norm_eps": 1e-05,
+    "rope_scaling": null,
+    "rope_theta": 1000000.0,
+    "sliding_window": null,
+    "tie_word_embeddings": false,
+    "torch_dtype": "bfloat16",
+    "transformers_version": "4.46.0",
+    "use_cache": false,
+    "use_sliding_window": false,
+    "vocab_size": 152064,
+    "quantization_config": {
+        "quant_method": "exl3",
+        "version": "0.0.4",
+        "bits": 4.0,
+        "head_bits": 6,
+        "calibration": {
+            "rows": 100,
+            "cols": 2048
+        },
+        "out_scales": "auto"
+    }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+    "bos_token_id": 151643,
+    "pad_token_id": 151643,
+    "eos_token_id": [
+        151645,
+        151643
+    ],
+    "temperature": 0.6,
+    "top_p": 0.95,
+    "repetition_penalty": 1.0
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model-00001-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94db585972f95b6a03d186fece2c9b6b5c9f720f952db68a7197f6ef1f574ab6
+size 8391760800

model-00002-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:048d2bb43d10d89a8558206ec36216df1520aaa19e15f1500783d069d6c08b66
+size 8543277872

model-00003-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1046d9465889a03c71b9904c227b8d900add1d3762d13e76df91ba780faed523
+size 828344304

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

quantization_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful assistant. To answer the user\\'s question, you first think about the reasoning process and then provide the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful assistant. To answer the user\\'s question, you first think about the reasoning process and then provide the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff