lunahr commited on
Commit
5304d23
·
unverified ·
1 Parent(s): 9b5903f

Replace tracking fix with official version

Browse files
Files changed (2) hide show
  1. README.md +5 -8
  2. config.json +9 -0
README.md CHANGED
@@ -7,7 +7,6 @@ tags:
7
  base_model:
8
  - sesame/csm-1b
9
  pipeline_tag: text-to-speech
10
- library_name: diffusers
11
  ---
12
 
13
  ## CSM 1B (Safetensors)
@@ -27,7 +26,7 @@ A hosted [HuggingFace space](https://huggingface.co/spaces/sesame/csm-1b) is als
27
 
28
  Setup the repo
29
 
30
- ```bash
31
  git clone [email protected]:SesameAILabs/csm.git
32
  cd csm
33
  python3.10 -m venv .venv
@@ -37,13 +36,11 @@ pip install -r requirements.txt
37
 
38
  Generate a sentence
39
 
40
- ```python
41
- from huggingface_hub import hf_hub_download
42
  from generator import load_csm_1b
43
  import torchaudio
44
 
45
- model_path = hf_hub_download(repo_id="sesame/csm-1b", filename="ckpt.pt")
46
- generator = load_csm_1b(model_path, "cuda")
47
  audio = generator.generate(
48
  text="Hello from Sesame.",
49
  speaker=0,
@@ -56,7 +53,7 @@ torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
56
 
57
  CSM sounds best when provided with context. You can prompt or provide context to the model using a `Segment` for each speaker utterance.
58
 
59
- ```python
60
  speakers = [0, 1, 0, 0]
61
  transcripts = [
62
  "Hey how are you doing.",
@@ -117,4 +114,4 @@ This project provides a high-quality speech generation model for research and ed
117
  By using this model, you agree to comply with all applicable laws and ethical guidelines. We are **not responsible** for any misuse, and we strongly condemn unethical applications of this technology.
118
 
119
  **Authors**
120
- Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
 
7
  base_model:
8
  - sesame/csm-1b
9
  pipeline_tag: text-to-speech
 
10
  ---
11
 
12
  ## CSM 1B (Safetensors)
 
26
 
27
  Setup the repo
28
 
29
+ ```sh
30
  git clone [email protected]:SesameAILabs/csm.git
31
  cd csm
32
  python3.10 -m venv .venv
 
36
 
37
  Generate a sentence
38
 
39
+ ```py
 
40
  from generator import load_csm_1b
41
  import torchaudio
42
 
43
+ generator = load_csm_1b(device="cuda")
 
44
  audio = generator.generate(
45
  text="Hello from Sesame.",
46
  speaker=0,
 
53
 
54
  CSM sounds best when provided with context. You can prompt or provide context to the model using a `Segment` for each speaker utterance.
55
 
56
+ ```py
57
  speakers = [0, 1, 0, 0]
58
  transcripts = [
59
  "Hey how are you doing.",
 
114
  By using this model, you agree to comply with all applicable laws and ethical guidelines. We are **not responsible** for any misuse, and we strongly condemn unethical applications of this technology.
115
 
116
  **Authors**
117
+ Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "args": {
3
+ "audio_num_codebooks": 32,
4
+ "audio_vocab_size": 2051,
5
+ "backbone_flavor": "llama-1B",
6
+ "decoder_flavor": "llama-100M",
7
+ "text_vocab_size": 128256
8
+ }
9
+ }