|
--- |
|
license: mit |
|
--- |
|
|
|
Converted [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) model in onnx fp16/int8 format for use with [Vespa Embedding](https://docs.vespa.ai/en/embedding.html). |
|
|
|
- intfloat-multilingual-e5-large_fp16.onnx (fp16) |
|
- intfloat-multilingual-e5-large_quantized.onnx (int8 quantized) |
|
|
|
The model was quantized using the [optimum](https://github.com/huggingface/optimum) toolkit. |
|
|
|
## Example of vespa services.xml: |
|
|
|
**Notice**: FP16 works well with Vespa versions `8.325.46` and above. |
|
|
|
``` |
|
<component id="me5_large" type="hugging-face-embedder"> |
|
<transformer-model |
|
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_fp16.onnx" /> |
|
<!-- or int8 quantization model |
|
<transformer-model |
|
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_quantized.onnx" |
|
/> |
|
--> |
|
<tokenizer-model |
|
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/tokenizer.json" /> |
|
<normalize>true</normalize> |
|
<pooling-strategy>mean</pooling-strategy> |
|
</component> |
|
``` |
|
|
|
### deploy |
|
|
|
``` |
|
# FP16 model has a larger file size, which can result in longer deployment times. |
|
vespa deploy --wait 1800 . |
|
``` |
|
|
|
|
|
## Tips: conver to int8 quantized |
|
|
|
``` |
|
# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py |
|
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large |
|
``` |
|
|
|
``` |
|
optimum-cli onnxruntime quantize --onnx_model ./me5-large -o me5-large-large_quantized --avx512_vnni |
|
``` |
|
|
|
|
|
## Tips: convert to fp16 |
|
|
|
``` |
|
# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py |
|
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large |
|
``` |
|
|
|
- https://gist.github.com/hotchpotch/64fa52d32886fe61cc1d110066afef38 |
|
|
|
``` |
|
# https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py |
|
|
|
import onnx |
|
from onnxruntime.transformers.float16 import convert_float_to_float16 |
|
|
|
onnx_model = onnx.load("me5-large/intfloat-multilingual-e5-large.onnx") |
|
model_fp16 = convert_float_to_float16(onnx_model, disable_shape_infer=True) |
|
onnx.save(model_fp16, "me5-large/intfloat-multilingual-e5-large_fp16.onnx") |
|
``` |
|
|
|
## License |
|
|
|
The license for this model is based on the original license (found in the LICENSE file in the project's root directory), which is the MIT License. |
|
|
|
- https://huggingface.co/intfloat/multilingual-e5-large |
|
|
|
## Attribution |
|
|
|
All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors. |