File size: 3,768 Bytes
2a42f96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
language:
  - multilingual
license: mit
license_link: https://huggingface.co/moonshotai/Kimi-Dev-72B/blob/main/LICENSE.md
library_name: transformers
pipeline_tag: text-generation
tags:
- GPTQ
- Int8
- vLLM
- code
- swebench
- software
- issue-resolving
base_model:
  - moonshotai/Kimi-Dev-72B
base_model_relation: quantized
---
# Kimi-Dev-72B-GPTQ-Int8
Base model: [moonshotai/Kimi-Dev-72B](https://huggingface.co/moonshotai/Kimi-Dev-72B)

<i>Calibrate using the https://huggingface.co/datasets/timdettmers/openassistant-guanaco/blob/main/openassistant_best_replies_eval.jsonl dataset.</i>
<br>
<i>The quantization configuration is as follows</i>

```
quant_config = QuantizeConfig(bits=8, group_size=128, desc_act=False)
```

### 【vLLM Startup Command】
```
vllm serve JunHowie/Kimi-Dev-72B-GPTQ-Int8 
```


### 【Model Download】

```python
from huggingface_hub import snapshot_download
snapshot_download('JunHowie/Kimi-Dev-72B-GPTQ-Int8', cache_dir="your_local_path")
```

### 【Overview】
<!-- # Kimi-Dev -->

<div align="center">
  <img src="./assets/main_logo.png" alt="Kimi Logo" width="400" />
<h2><a href="https://moonshotai.github.io/Kimi-Dev/">
Introducing Kimi-Dev: <br>A Strong and Open-source Coding LLM for Issue Resolution</a></h2>
</a></h2>
<b>Kimi-Dev Team</b>
<br>

</div>
<div align="center">
  <a href="">
    <b>📄 Tech Report (Coming soon...)</b>
  </a> &nbsp;|&nbsp;
  <a href="https://github.com/MoonshotAI/Kimi-Dev">
    <b>📄 Github</b>
  </a> &nbsp;
</div>

<br>
<br>

<!-- https://github.com/MoonshotAI/Kimi-Dev -->

We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models.

- Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.


- Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.


- Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.


<div align="center">
  <img src="./assets/open_performance_white.png" alt="Kimi Logo" width="600" />
  <p><b>Performance of Open-source Models on SWE-bench Verified.</b></p>

</div>



## Quick Start
```
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "moonshotai/Kimi-Dev-72B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

```

## Citation
```
@misc{kimi_dev_72b_2025,
  title        = {Introducing Kimi-Dev: A Strong and Open-source Coding LLM for Issue Resolution},
  author       = {{Kimi-Dev Team}},
  year         = {2025},
  month        = {June},
  url          = {\url{https://www.moonshot.cn/Kimi-Dev}}
}
```