Samples

by exoplanet - opened May 19

Discussion

exoplanet

May 19

Hi,
Thanks for this quantization! Are the samples from the quantized model?
Cheers.

lex-au

Owner May 19

Hi,

Yes, the sample audios provided were generated using the Q4_K_M quantised model, available here:
🔗 lex-au/Orpheus-3b-FT-Q4_K_M.gguf

The primary difference you'll notice when using lower quantisation levels is that longer generations may exhibit some drift. However, in practice, the substantial reduction in latency makes this a worthwhile tradeoff for most applications.

I'd recommend giving my project a try as well if you want to test it out Orpheus-FASTAPI

Cheers,
Lex

exoplanet

May 20

Hi,
Nice, I'll test this API, although my GPU is what comes with a MBP.
Btw, is the drift you noticed more like an intonation error, or a pronunciation error, or both?
Cheers
e.

lex-au

Owner May 20

Hi,
Nice, I'll test this API, although my GPU is what comes with a MBP.
Btw, is the drift you noticed more like an intonation error, or a pronunciation error, or both?
Cheers
e.

The “drift” I’m referring to is less about audio quality degradation and more about the model’s ability to maintain contextual consistency throughout a generation. For example, a sentence might begin with a sad tone, because the tokenizer happened to assign emotional weight that way; but then suddenly shift to a loud or neutral delivery mid-thought. It’s this kind of inconsistent emotional or tonal carryover that I’m highlighting.

exoplanet

May 20

I see, basically that implies Q4 quantizations are a no go for production workloads. I hope Q8, if not Q6, works flawlessly?

exoplanet

May 20

Btw, check out Gapeleon/slim-orpheus-3b-JAPANESE-ft

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment