hzy00 commited on
Commit
5ee18d8
·
verified ·
1 Parent(s): 4200188

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -32,6 +32,7 @@ The model demonstrates how a principled training methodology can enable smaller
32
 
33
  ## How to Get Started
34
 
 
35
  You can use the `transformers` library to load and run `SWE-Swiss-32B-SFT`.
36
 
37
  ```python
@@ -57,6 +58,37 @@ model = AutoModelForCausalLM.from_pretrained(
57
  )
58
  ```
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ## Citation
62
  ```bibtex
 
32
 
33
  ## How to Get Started
34
 
35
+ ### Transformers
36
  You can use the `transformers` library to load and run `SWE-Swiss-32B-SFT`.
37
 
38
  ```python
 
58
  )
59
  ```
60
 
61
+ ### vLLM
62
+ You can also use the [`vLLM`](https://github.com/vllm-project/vllm) library to load and run `SWE-Swiss-32B-SFT`.
63
+
64
+ Firstly. git clone the vLLM repository.
65
+ ```
66
+ git clone https://github.com/vllm-project/vllm
67
+ cd vllm
68
+ git checkout v0.8.4 # or other versions compatible with Qwen2.
69
+ ```
70
+
71
+ Then, change the [o_bias in the attention module](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/models/qwen2.py#L148) to True and install vllm.
72
+
73
+ ```
74
+ # please remember to set "bias=False" to "bias=True" before install vLLM.
75
+ pip3 install -e .
76
+ ```
77
+
78
+ Finally, use vLLM as usual:
79
+ ```python
80
+ from vllm import LLM, SamplingParams
81
+ prompts = [
82
+ "How are you?",
83
+ ]
84
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95)
85
+ llm = LLM(model="SWE-Swiss/SWE-Swiss-32B-SFT", tensor_parallel_size=8, max_model_len=102400)
86
+ outputs = llm.generate(prompts, sampling_params)
87
+ for output in outputs:
88
+ prompt = output.prompt
89
+ generated_text = output.outputs[0].text
90
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
91
+ ```
92
 
93
  ## Citation
94
  ```bibtex