lunahr
/

csm-1b-safetensors-quants

lunahr commited on Mar 15

Commit

5304d23

unverified ·

1 Parent(s): 9b5903f

Replace tracking fix with official version

Files changed (2) hide show

README.md CHANGED Viewed

@@ -7,7 +7,6 @@ tags:
 base_model:
 - sesame/csm-1b
 pipeline_tag: text-to-speech
-library_name: diffusers
 ---
 ## CSM 1B (Safetensors)
@@ -27,7 +26,7 @@ A hosted [HuggingFace space](https://huggingface.co/spaces/sesame/csm-1b) is als
 Setup the repo
-```bash
 git clone [email protected]:SesameAILabs/csm.git
 cd csm
 python3.10 -m venv .venv
@@ -37,13 +36,11 @@ pip install -r requirements.txt
 Generate a sentence
-```python
-from huggingface_hub import hf_hub_download
 from generator import load_csm_1b
 import torchaudio
-model_path = hf_hub_download(repo_id="sesame/csm-1b", filename="ckpt.pt")
-generator = load_csm_1b(model_path, "cuda")
 audio = generator.generate(
     text="Hello from Sesame.",
     speaker=0,
@@ -56,7 +53,7 @@ torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
 CSM sounds best when provided with context. You can prompt or provide context to the model using a `Segment` for each speaker utterance.
-```python
 speakers = [0, 1, 0, 0]
 transcripts = [
     "Hey how are you doing.",
@@ -117,4 +114,4 @@ This project provides a high-quality speech generation model for research and ed
 By using this model, you agree to comply with all applicable laws and ethical guidelines. We are **not responsible** for any misuse, and we strongly condemn unethical applications of this technology.
 **Authors**
-Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.

 base_model:
 - sesame/csm-1b
 pipeline_tag: text-to-speech
 ---
 ## CSM 1B (Safetensors)
 Setup the repo
+```sh
 git clone [email protected]:SesameAILabs/csm.git
 cd csm
 python3.10 -m venv .venv
 Generate a sentence
+```py
 from generator import load_csm_1b
 import torchaudio
+generator = load_csm_1b(device="cuda")
 audio = generator.generate(
     text="Hello from Sesame.",
     speaker=0,
 CSM sounds best when provided with context. You can prompt or provide context to the model using a `Segment` for each speaker utterance.
+```py
 speakers = [0, 1, 0, 0]
 transcripts = [
     "Hey how are you doing.",
 By using this model, you agree to comply with all applicable laws and ethical guidelines. We are **not responsible** for any misuse, and we strongly condemn unethical applications of this technology.
 **Authors**
+Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.

config.json ADDED Viewed

+{
+  "args": {
+    "audio_num_codebooks": 32,
+    "audio_vocab_size": 2051,
+    "backbone_flavor": "llama-1B",
+    "decoder_flavor": "llama-100M",
+    "text_vocab_size": 128256
+  }
+}