
sesame/csm-1b
Text-to-Speech
โข
Updated
โข
55k
โข
1.99k
I was thinking exactly the same thing when ChatGPT first came out! I have run some minor experiments with causal language modeling by having a fixed number of users/speakers and then instruct fine-tuning the base/foundational model. "Dynamic number of speakers" sounds interesting, though! Maybe there is a clever way to inject new tokens into the vocabulary to achieve this.
Would love to contribute tothis initiative.