---
license: mit
---
# SWE-Swiss-32B-SFT

<p align="center">
  <img src="figure1.png" alt="SWE-Swiss Performance Chart" width="85%">
</p>
<p align="center">
  <em>Figure 1: Performance and model size comparison on SWE-bench Verified. Our 32B model achieves a top-tier score of 60.2%.</em>
</p>


**SWE-Swiss-32B-SFT** is a 32B parameter model, fine-tuned for high-performance software issue resolution. It is the result of multi-task fine-tuning.

This repository contains the model weights for `SWE-Swiss-32B-SFT`. The model was developed by researchers from Peking University, ByteDance Seed, and The University of Hong Kong.

- **Base Model:** [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 
- **Blog:** [SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution](https://www.notion.so/SWE-Swiss-A-Multi-Task-Fine-Tuning-and-RL-Recipe-for-High-Performance-Issue-Resolution-21e174dedd4880ea829ed4c861c44f88)
- **GitHub:** https://github.com/zhenyuhe00/SWE-Swiss

## Model Description

`SWE-Swiss-32B` is trained to be a versatile software engineering model. The core idea behind its training recipe is the explicit modeling of three key skills:

* **Localization:** Pinpointing the exact files that need to be modified.
* **Repair:** Generating the correct code patch to resolve an issue.
* **Unit Test Generation:** Creating new tests to validate the proposed fix.

The model demonstrates how a principled training methodology can enable smaller models to achieve performance levels previously only seen in much larger models, highlighting a path toward more efficient and accessible AI for software engineering.

## How to Get Started

### Transformers
You can use the `transformers` library to load and run `SWE-Swiss-32B-SFT`.

```python
import torch
import torch.nn as nn
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
# We set the o_bias in the attention module to True to be compatible with our code base.
def apply_qwen2_bias_patch():
    Qwen2Attention = transformers.models.qwen2.modeling_qwen2.Qwen2Attention
    original_qwen2_attention_init = Qwen2Attention.__init__
    def patched_qwen2_attention_init(self, config, layer_idx):
        original_qwen2_attention_init(self, config, layer_idx)
        self.o_proj = nn.Linear(config.num_attention_heads * self.head_dim, config.hidden_size, bias=True)
    Qwen2Attention.__init__ = patched_qwen2_attention_init
apply_qwen2_bias_patch()
model_id = "SWE-Swiss/SWE-Swiss-32B-SFT" 
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
```

### vLLM
You can also use the [`vLLM`](https://github.com/vllm-project/vllm) library to load and run `SWE-Swiss-32B-SFT`.

Firstly. git clone the vLLM repository.
```
git clone https://github.com/vllm-project/vllm
cd vllm
git checkout v0.8.4 # or other versions compatible with Qwen2.
```

Then, change the [o_bias in the attention module](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/models/qwen2.py#L148) to True and install vllm.

```
# please remember to set "bias=False" to "bias=True" before install vLLM.
pip3 install -e .
```

Finally, use vLLM as usual:
```python
from vllm import LLM, SamplingParams
prompts = [
    "How are you?",
]
sampling_params = SamplingParams(temperature=0.6, top_p=0.95)
llm = LLM(model="SWE-Swiss/SWE-Swiss-32B-SFT", tensor_parallel_size=8, max_model_len=102400)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

## Citation
```bibtex
@misc{SWESwiss2025,
    title = {SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution},
    url = {https://www.notion.so/SWE-Swiss-A-Multi-Task-Fine-Tuning-and-RL-Recipe-for-High-Performance-Issue-Resolution-21e174dedd4880ea829ed4c861c44f88},
    author = {He, Zhenyu and Yang, Qingping and Sheng, Wei and Zhong, Xiaojian and Zhang, Kechi and An, Chenxin and Shi, Wenlei and Cai, Tianle and He, Di and Chen, Jiaze and Xu, Jingjing and Wang, Mingxuan}
    year = {2025}
}
```