microsoft
/

Phi-3-mini-128k-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wwwaj commited on Apr 22

Commit

40a689c

•

1 Parent(s): f054138

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -127,7 +127,7 @@ print(output[0]['generated_text'])
 Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
 + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()`  with `attn_implementation="eager"`
-+ Optimized inference: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
 ## Responsible AI Considerations

 Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
 + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()`  with `attn_implementation="eager"`
++ Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
 ## Responsible AI Considerations