quantization

#1
by djsoapyknuckles - opened

This conversion is outputting garbled and unrecognizable audio. Can you explain a little more what your process was for converting and quantizing? I tried to use https://huggingface.co/spaces/ggml-org/gguf-my-repo to convert the pre-trained model to gguf, but it fails on tokenizer.model missing. This repo only has a tokenizer.json

As far as I can tell the quants are working fine, so I'm afraid that's a you problem.
Personally, I don't think the voice cloning capabilities of this model are that good in general (could be a skill issue on my part here), but I get normal audio output 100% of the time.

ref:

q8_0:

To answer your question: I cloned the repo and removed the <|audio|> token (I believe that was the one at least), since looking at the colab script it doesn't seem to be used in the pretrained model - only in the finetuned one. Then I converted the model to gguf with convert_hf_to_gguf.py from the llama.cpp repo and quantized it with llama-quantize. You don't need tokenizer.model, tokenizer.json is enough.

@Annuvin I got it to run in llama-cpp, but the outputs are strongly garbled. Could you please share the scripts you used for inference? Your q8_0 clip sound excellent.

Thanks for making the quant btw

I have a llama-cpp-python inference script here: https://github.com/Zuellni/Orpheus-GGUF
Take a look at the generate function if you just need the encoding/decoding part: https://github.com/Zuellni/Orpheus-GGUF/blob/main/classes.py#L41
It should work with this model but the finetuned one seems to use a different prompt so your mileage may vary.

Also a quick note: make sure you have a bos token both before the input and audio transcript since the hf tokenizer adds it by default. Bit of a weird choice to have 2 of them in the middle of a sequence, likely unintentional, but I was getting pretty weird outputs without them there as well.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment