SWE-Swiss
/

SWE-Swiss-32B-SFT

Model card Files Files and versions

hzy00 commited on 9 days ago

Commit

5ee18d8

·

verified ·

1 Parent(s): 4200188

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -32,6 +32,7 @@ The model demonstrates how a principled training methodology can enable smaller
 ## How to Get Started
 You can use the `transformers` library to load and run `SWE-Swiss-32B-SFT`.
 ```python
@@ -57,6 +58,37 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 ```
 ## Citation
 ```bibtex

 ## How to Get Started
+### Transformers
 You can use the `transformers` library to load and run `SWE-Swiss-32B-SFT`.
 ```python
 )
 ```
+### vLLM
+You can also use the [`vLLM`](https://github.com/vllm-project/vllm) library to load and run `SWE-Swiss-32B-SFT`.
+Firstly. git clone the vLLM repository.
+```
+git clone https://github.com/vllm-project/vllm
+cd vllm
+git checkout v0.8.4 # or other versions compatible with Qwen2.
+```
+Then, change the [o_bias in the attention module](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/models/qwen2.py#L148) to True and install vllm.
+```
+# please remember to set "bias=False" to "bias=True" before install vLLM.
+pip3 install -e .
+```
+Finally, use vLLM as usual:
+```python
+from vllm import LLM, SamplingParams
+prompts = [
+    "How are you?",
+]
+sampling_params = SamplingParams(temperature=0.6, top_p=0.95)
+llm = LLM(model="SWE-Swiss/SWE-Swiss-32B-SFT", tensor_parallel_size=8, max_model_len=102400)
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
 ## Citation
 ```bibtex