Samples
Hi,
Thanks for this quantization! Are the samples from the quantized model?
Cheers.
Hi,
Yes, the sample audios provided were generated using the Q4_K_M quantised model, available here:
🔗 lex-au/Orpheus-3b-FT-Q4_K_M.gguf
The primary difference you'll notice when using lower quantisation levels is that longer generations may exhibit some drift. However, in practice, the substantial reduction in latency makes this a worthwhile tradeoff for most applications.
I'd recommend giving my project a try as well if you want to test it out Orpheus-FASTAPI
Cheers,
Lex
Hi,
Nice, I'll test this API, although my GPU is what comes with a MBP.
Btw, is the drift you noticed more like an intonation error, or a pronunciation error, or both?
Cheers
e.
Hi,
Nice, I'll test this API, although my GPU is what comes with a MBP.
Btw, is the drift you noticed more like an intonation error, or a pronunciation error, or both?
Cheers
e.
The “drift” I’m referring to is less about audio quality degradation and more about the model’s ability to maintain contextual consistency throughout a generation. For example, a sentence might begin with a sad tone, because the tokenizer happened to assign emotional weight that way; but then suddenly shift to a loud or neutral delivery mid-thought. It’s this kind of inconsistent emotional or tonal carryover that I’m highlighting.
I see, basically that implies Q4 quantizations are a no go for production workloads. I hope Q8, if not Q6, works flawlessly?
Btw, check out Gapeleon/slim-orpheus-3b-JAPANESE-ft