Based on https://huggingface.co/microsoft/Phi-3.5-mini-instruct
Convert ONNX model by using https://github.com/microsoft/onnxruntime-genai
Using command: python -m onnxruntime_genai.models.builder -m microsoft/Phi-3.5-mini-instruct -o Phi-3.5-mini-instruct-onnx -e webgpu -c cache-dir -p int4 --extra_options int4_block_size=32 int4_accuracy_level=4
The generated external data (model.onnx.data) is larger than 2GB, which is not suitable for ORT-Web. I use an additional Python script to move some data into model.onnx.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support