PsiPi commited on
Commit
eaecbf6
1 Parent(s): 1c38862

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -60,7 +60,7 @@ model = model.to(device)
60
 
61
  # Set up text and timing conditioning
62
  conditioning = [{
63
- "prompt": "128 BPM tech house drum loop",
64
  "seconds_start": 0,
65
  "seconds_total": 30
66
  }]
@@ -99,7 +99,7 @@ pipe = StableAudioPipeline.from_pretrained("PsiPi/audio", torch_dtype=torch.floa
99
  pipe = pipe.to("cuda")
100
 
101
  # define the prompts
102
- prompt = "The sound of a hammer hitting a wooden surface."
103
  negative_prompt = "Low quality."
104
 
105
  # set the seed for generator
@@ -116,7 +116,7 @@ audio = pipe(
116
  ).audios
117
 
118
  output = audio[0].T.float().cpu().numpy()
119
- sf.write("hammer.wav", output, pipe.vae.sampling_rate)
120
 
121
  ```
122
  Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
@@ -134,7 +134,7 @@ Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index
134
  ## Training dataset
135
 
136
  ### Datasets Used
137
- Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning.
138
 
139
  ### Attribution
140
  Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
 
60
 
61
  # Set up text and timing conditioning
62
  conditioning = [{
63
+ "prompt": "specialKay vocal, A cappella, 120 BPM twobob house vocalisations feat. Special Kay",
64
  "seconds_start": 0,
65
  "seconds_total": 30
66
  }]
 
99
  pipe = pipe.to("cuda")
100
 
101
  # define the prompts
102
+ prompt = "The sound of specialKay vocalising, A cappella"
103
  negative_prompt = "Low quality."
104
 
105
  # set the seed for generator
 
116
  ).audios
117
 
118
  output = audio[0].T.float().cpu().numpy()
119
+ sf.write("specialk.wav", output, pipe.vae.sampling_rate)
120
 
121
  ```
122
  Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
 
134
  ## Training dataset
135
 
136
  ### Datasets Used
137
+ Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning. We then further finetuned on [this kaggle dataset](https://www.kaggle.com/datasets/twobob/moar-bobtex-n-friends-gpu-fodder)
138
 
139
  ### Attribution
140
  Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.