File size: 8,166 Bytes
e4adace 3af954d e4adace b391867 e4adace 4b0c1b5 e4adace 232a861 c950876 232a861 c950876 232a861 0d63cc4 89c48b0 0d63cc4 89c48b0 0d63cc4 232a861 e4adace e7d22c8 e4adace e7d22c8 e4adace 6077464 e4adace e7d22c8 e4adace e7d22c8 e4adace bd544e9 e4adace 79ee2b7 e4adace 4b0c1b5 79ee2b7 4b0c1b5 79ee2b7 4b0c1b5 79ee2b7 4b0c1b5 79ee2b7 e4adace |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
---
license: apache-2.0
base_model:
- facebook/musicgen-large
base_model_relation: quantized
pipeline_tag: text-to-audio
language:
- en #
tags:
- text-to-audio
- music-generation
- pytorch
- annthem
- qlip
- thestage
---
# Elastic model: MusicGen Large. Fastest and most flexible models for self-serving.
# Attention: this page is for informational purposes only, to use the models you need to wait for the update of elastic_models package!
Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
* __XL__: Mathematically equivalent neural network (original compiled `facebook/musicgen-large`), optimized with our DNN compiler.
* __L__: Near lossless model, with minimal degradation obtained on corresponding audio quality benchmarks.
* __M__: Faster model, with minor and acceptable accuracy degradation.
* __S__: The fastest model, with slight accuracy degradation, offering the best speed.
* __Original__: The original `facebook/musicgen-large` model from Hugging Face, without QLIP compilation.
__Goals of elastic models:__
* Provide flexibility in cost vs quality selection for inference
* Provide clear quality and latency benchmarks for audio generation
* Provide interface of HF libraries: `transformers` and `elastic_models` with a single line of code change for using optimized versions.
* Provide models supported on a wide range of hardware (NVIDIA GPUs), which are pre-compiled and require no JIT.
* Provide the best models and service for self-hosting.
> It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.

## Audio Examples
Below are a few examples demonstrating the audio quality of the different Elastic MusicGen Large versions. These samples were generated on an NVIDIA H100 GPU with a duration of 20 seconds each. For a more comprehensive set of examples and interactive demos, please visit [musicgen.thestage.ai](http://music.thestage.ai).
**Prompt:** "Calm lofi hip hop track with a simple piano melody and soft drums" (Audio: 20 seconds, H100 GPU)
| S | M | L | XL (Compiled Original) | Original (HF Non-Compiled) |
|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/S82_oagiYy2r00ZYpBJ3Q.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/n7RWM2q3YHUE0oA-oiISy.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/LBnfVjM2jNEqndVhBnXok.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/TYINxt_EcH-60oHMnO-B0.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/IKxeZ2LVYNsrjeNE9B7vS.mpga"></audio> |
-----
## Inference
To infer our MusicGen models, you primarily use the `elastic_models.transformers.MusicgenForConditionalGeneration` class. If you have compiled engines, you provide the path to them. Otherwise, for non-compiled or original models, you can use the standard Hugging Face `transformers.MusicgenForConditionalGeneration`.
**Example using `elastic_models` with a compiled model:**
```python
import torch
import scipy.io.wavfile
from transformers import AutoProcessor
from elastic_models.transformers import MusicgenForConditionalGeneration
model_name_hf = "facebook/musicgen-large"
elastic_mode = "S"
prompt = "A groovy funk bassline with a tight drum beat"
output_wav_path = "generated_audio_elastic_S.wav"
hf_token = "YOUR_TOKEN"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = AutoProcessor.from_pretrained(model_name_hf, token=hf_token)
model = MusicgenForConditionalGeneration.from_pretrained(
model_name_hf,
token=hf_token,
torch_dtype=torch.float16,
mode=elastic_mode,
device=device,
).to(device)
model.eval()
inputs = processor(
text=[prompt],
padding=True,
return_tensors="pt",
).to(device)
print(f"Generating audio for: {prompt}...")
generate_kwargs = {"do_sample": True, "guidance_scale": 3.0, "max_new_tokens": 256, "cache_implementation": "paged"}
audio_values = model.generate(**inputs, **generate_kwargs)
audio_values_np = audio_values.to(torch.float32).cpu().numpy().squeeze()
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=audio_values_np)
print(f"Audio saved to {output_wav_path}")
```
__System requirements:__
* GPUs: NVIDIA H100, L40S.
* CPU: AMD, Intel
* Python: 3.8-3.11 (check dependencies for specific versions)
To work with our elastic models and compilation tools, you\'ll need to install `elastic_models` and `qlip` libraries from TheStage:
```shell
pip install thestage
pip install elastic_models[nvidia]\
--index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
--extra-index-url https://pypi.nvidia.com\
--extra-index-url https://pypi.org/simple
pip install flash-attn==2.7.3 --no-build-isolation
pip uninstall apex
```
Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
```shell
thestage config set --api-token <YOUR_API_TOKEN>
```
Congrats, now you can use accelerated models and tools!
----
## Benchmarks
Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for MusicGen models using our algorithms.
The `Original` column in latency benchmarks typically refers to the Hugging Face `facebook/musicgen-large` model compiled without ANNA quantization (XL in our terminology).
### Latency benchmarks (Tokens Per Second - TPS)
Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
**Batch Size 1:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 130.52 | 129.87 | 128.57 | 129.25 | 44.80 |
| L40S | 101.70 | 95.65 | 89.99 | 83.39 | 44.43 |
**Batch Size 16:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 106.06 | 105.82 | 107.07 | 106.55 | 41.09 |
| L40S | 74.97 | 71.52 | 68.09 | 63.86 | 36.40 |
**Batch Size 32:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 83.58 | 84.13 | 84.04 | 83.90 | 34.50 |
| L40S | 57.36 | 55.60 | 53.73 | 51.33 | 28.72 |
## Links
* __Platform__: [app.thestage.ai](https://app.thestage.ai)
* __Subscribe for updates__: [TheStageAI X (Twitter)](https://x.com/TheStageAI)
* __Contact email__: [email protected]
|