Update README.md
Browse files
README.md
CHANGED
@@ -127,7 +127,7 @@ print(output[0]['generated_text'])
|
|
127 |
Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
|
128 |
|
129 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
130 |
-
+ Optimized inference: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
|
131 |
|
132 |
## Responsible AI Considerations
|
133 |
|
|
|
127 |
Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
|
128 |
|
129 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
130 |
+
+ Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
|
131 |
|
132 |
## Responsible AI Considerations
|
133 |
|