Post
2455
SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐from_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.Wow, this has quite a short processing time.
Awesome!
imatrix
quantization in place of quip#prompt_lookup_num_tokens=10
to your generate
call, and you'll get faster LLMs ๐ฅ