Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ model = model.to(device)
|
|
60 |
|
61 |
# Set up text and timing conditioning
|
62 |
conditioning = [{
|
63 |
-
"prompt": "
|
64 |
"seconds_start": 0,
|
65 |
"seconds_total": 30
|
66 |
}]
|
@@ -99,7 +99,7 @@ pipe = StableAudioPipeline.from_pretrained("PsiPi/audio", torch_dtype=torch.floa
|
|
99 |
pipe = pipe.to("cuda")
|
100 |
|
101 |
# define the prompts
|
102 |
-
prompt = "The sound of
|
103 |
negative_prompt = "Low quality."
|
104 |
|
105 |
# set the seed for generator
|
@@ -116,7 +116,7 @@ audio = pipe(
|
|
116 |
).audios
|
117 |
|
118 |
output = audio[0].T.float().cpu().numpy()
|
119 |
-
sf.write("
|
120 |
|
121 |
```
|
122 |
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
|
@@ -134,7 +134,7 @@ Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index
|
|
134 |
## Training dataset
|
135 |
|
136 |
### Datasets Used
|
137 |
-
Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning.
|
138 |
|
139 |
### Attribution
|
140 |
Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
|
|
|
60 |
|
61 |
# Set up text and timing conditioning
|
62 |
conditioning = [{
|
63 |
+
"prompt": "specialKay vocal, A cappella, 120 BPM twobob house vocalisations feat. Special Kay",
|
64 |
"seconds_start": 0,
|
65 |
"seconds_total": 30
|
66 |
}]
|
|
|
99 |
pipe = pipe.to("cuda")
|
100 |
|
101 |
# define the prompts
|
102 |
+
prompt = "The sound of specialKay vocalising, A cappella"
|
103 |
negative_prompt = "Low quality."
|
104 |
|
105 |
# set the seed for generator
|
|
|
116 |
).audios
|
117 |
|
118 |
output = audio[0].T.float().cpu().numpy()
|
119 |
+
sf.write("specialk.wav", output, pipe.vae.sampling_rate)
|
120 |
|
121 |
```
|
122 |
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
|
|
|
134 |
## Training dataset
|
135 |
|
136 |
### Datasets Used
|
137 |
+
Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning. We then further finetuned on [this kaggle dataset](https://www.kaggle.com/datasets/twobob/moar-bobtex-n-friends-gpu-fodder)
|
138 |
|
139 |
### Attribution
|
140 |
Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
|