RealTime

#10

by Kar0nte - opened Mar 14

Discussion

Kar0nte

Mar 14

Hi guys, great job! That's cool! Any suggestions for an opensource RealTime mode? Thanks

Someshfengde

Mar 14

I'm trying to see if it can work with openai realtime console

https://github.com/openai/openai-realtime-console

for testing out on realtime what are you trying onto ? @Kar0nte

Kar0nte

Mar 14

@Someshfengde I was thinking about frameworks like LiveKit and FastRTC for real-time streaming. Do you think CSM-1B is fast enough for a WebRTC pipeline, or would we need additional optimization?

quadratrix

Mar 15

The demo gets around the limitations of the model by starting to process the input while the user is still talking, then it seems to stitch the responses together. You can basically force it to use the entire generation time by asking it a series of "repeat after me" in the same sentence, followed by something that triggers it's guardrails (expletives, etc)

To replicate the demo, you'd basically need a fast text-to-speech model, then fire it off to the LLM for a response and bring that response to the CSM for audio generation.

Kar0nte

Mar 15

Hi @quadratrix that's why I was thinking about frameworks like livekit, where you can use STT like Whisper, LLM, VOD and TTS, and I wanted to understand if it would be possible to use it. But from what I'm finding out, the 1B model is very immature and needs a lot of optimization. What do you think? To host it locally you would still need a lot of power for acceptable latency. And it seems to only handle English, and not very well either. I think it will take a long time before we can use it well.

wutadmin

Mar 17

MrDragonFox on Discord seems to suggest that he's got the response times down to 3.3 RTR (Millisecond Response), however doesn't upload an audio.wav file to validate that the output is usable. Or attempt to offer anything else.

https://discord.com/channels/1349855029938487437/1349855193151569972

zenoran

Mar 18

•

edited Mar 18

This guy seems to really know his stuff. I learned a Lot from his code this evening. Was able to get working end to end voice chat similar to what you're asking about.

https://github.com/nytopop/csm

Audio was sounding like crap though. Needs a lot of tweaking to get the streaming parameters synced up. Very promising though!

I honestly don't think "bootleg Maya" is all that far away. I have gotten the voice dialed in really good in my repo. Transcripts are key. I believe a changing set of context grouped by emotions are going to be needed to get anywhere near demo quality though.

Someshfengde

Mar 18

•

edited Mar 18

@zenoran can you share your repo?

zenoran

Mar 18

•

edited Mar 18

@zenoran can you share your repo?

Mostly copilot code, I haven't gone through and cleaned it up but it's stable. It got so fast with the compiler, the sentences now overlap so it needs some tweaking. Gotta do the day job though 😔
https://github.com/zenoran/sesameai-tts

BTW this is just console TTS I haven't pushed the web end to end I got working based on that guy's repo.

Someshfengde

Mar 19

Thanks for your response I'll try to run this over lightning.ai.

I've been trying to run this over macos but no luck :( raised issue on their official repo

edantonio505

Mar 19

•

edited Mar 19

I was able to stream real time but it is a bit slow. The model doesn't seem to to work well for real time streaming. I used 1 H200 and it doesn't generate stream chunk fast enough. The sound comes up a bit chopped.

FBARRCA

Mar 21

•

edited Mar 22

Disabling watermarks improves performance by around 20%. However, I'm still not achieving real-time speeds with an RTX 5070 Ti—I'm getting around 0.5x

EDIT: I was able to achieve real-time compiling the decoder with inductor as the backend!

quadratrix

Mar 22

Try adding " model.decoder = torch.compile(model.decoder, fullgraph=True, mode='reduce-overhead')" on L178 in load_csm_1b

sitharalakmal

Mar 27

•

edited Mar 27

Can this be integrated with APIs to get realtime data or to do a task like make an appointment calling a different API?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment