Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web

This is the same models as the official phi3 onnx model with a few changes to make it work for onnxruntime-web:

  1. the model is fp16 with int4 block quantization for weights
  2. the 'logits' output is fp32
  3. the model uses MHA instead of GQA
  4. onnx and external data file need to stay below 2GB to be cacheable in chromium
Downloads last month
457
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support transformers.js models with pipeline type text-generation