Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web
This is the same models as the official phi3 onnx model with a few changes to make it work for onnxruntime-web:
- the model is fp16 with int4 block quantization for weights
- the 'logits' output is fp32
- the model uses MHA instead of GQA
- onnx and external data file need to stay below 2GB to be cacheable in chromium
- Downloads last month
- 457
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the HF Inference API does not support transformers.js models with pipeline type text-generation