psynote123's picture
Update README.md
b391867 verified
---
license: apache-2.0
base_model:
- facebook/musicgen-large
base_model_relation: quantized
pipeline_tag: text-to-audio
language:
- en #
tags:
- text-to-audio
- music-generation
- pytorch
- annthem
- qlip
- thestage
---
# Elastic model: MusicGen Large. Fastest and most flexible models for self-serving.
# Attention: this page is for informational purposes only, to use the models you need to wait for the update of elastic_models package!
Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
* __XL__: Mathematically equivalent neural network (original compiled `facebook/musicgen-large`), optimized with our DNN compiler.
* __L__: Near lossless model, with minimal degradation obtained on corresponding audio quality benchmarks.
* __M__: Faster model, with minor and acceptable accuracy degradation.
* __S__: The fastest model, with slight accuracy degradation, offering the best speed.
* __Original__: The original `facebook/musicgen-large` model from Hugging Face, without QLIP compilation.
__Goals of elastic models:__
* Provide flexibility in cost vs quality selection for inference
* Provide clear quality and latency benchmarks for audio generation
* Provide interface of HF libraries: `transformers` and `elastic_models` with a single line of code change for using optimized versions.
* Provide models supported on a wide range of hardware (NVIDIA GPUs), which are pre-compiled and require no JIT.
* Provide the best models and service for self-hosting.
> It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7kuTModQp4_5lRqR5QJ5P.png)
## Audio Examples
Below are a few examples demonstrating the audio quality of the different Elastic MusicGen Large versions. These samples were generated on an NVIDIA H100 GPU with a duration of 20 seconds each. For a more comprehensive set of examples and interactive demos, please visit [musicgen.thestage.ai](http://music.thestage.ai).
**Prompt:** "Calm lofi hip hop track with a simple piano melody and soft drums" (Audio: 20 seconds, H100 GPU)
| S | M | L | XL (Compiled Original) | Original (HF Non-Compiled) |
|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/S82_oagiYy2r00ZYpBJ3Q.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/n7RWM2q3YHUE0oA-oiISy.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/LBnfVjM2jNEqndVhBnXok.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/TYINxt_EcH-60oHMnO-B0.mpga"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/IKxeZ2LVYNsrjeNE9B7vS.mpga"></audio> |
-----
## Inference
To infer our MusicGen models, you primarily use the `elastic_models.transformers.MusicgenForConditionalGeneration` class. If you have compiled engines, you provide the path to them. Otherwise, for non-compiled or original models, you can use the standard Hugging Face `transformers.MusicgenForConditionalGeneration`.
**Example using `elastic_models` with a compiled model:**
```python
import torch
import scipy.io.wavfile
from transformers import AutoProcessor
from elastic_models.transformers import MusicgenForConditionalGeneration
model_name_hf = "facebook/musicgen-large"
elastic_mode = "S"
prompt = "A groovy funk bassline with a tight drum beat"
output_wav_path = "generated_audio_elastic_S.wav"
hf_token = "YOUR_TOKEN"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = AutoProcessor.from_pretrained(model_name_hf, token=hf_token)
model = MusicgenForConditionalGeneration.from_pretrained(
model_name_hf,
token=hf_token,
torch_dtype=torch.float16,
mode=elastic_mode,
device=device,
).to(device)
model.eval()
inputs = processor(
text=[prompt],
padding=True,
return_tensors="pt",
).to(device)
print(f"Generating audio for: {prompt}...")
generate_kwargs = {"do_sample": True, "guidance_scale": 3.0, "max_new_tokens": 256, "cache_implementation": "paged"}
audio_values = model.generate(**inputs, **generate_kwargs)
audio_values_np = audio_values.to(torch.float32).cpu().numpy().squeeze()
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=audio_values_np)
print(f"Audio saved to {output_wav_path}")
```
__System requirements:__
* GPUs: NVIDIA H100, L40S.
* CPU: AMD, Intel
* Python: 3.8-3.11 (check dependencies for specific versions)
To work with our elastic models and compilation tools, you\'ll need to install `elastic_models` and `qlip` libraries from TheStage:
```shell
pip install thestage
pip install elastic_models[nvidia]\
--index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\
--extra-index-url https://pypi.nvidia.com\
--extra-index-url https://pypi.org/simple
pip install flash-attn==2.7.3 --no-build-isolation
pip uninstall apex
```
Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
```shell
thestage config set --api-token <YOUR_API_TOKEN>
```
Congrats, now you can use accelerated models and tools!
----
## Benchmarks
Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for MusicGen models using our algorithms.
The `Original` column in latency benchmarks typically refers to the Hugging Face `facebook/musicgen-large` model compiled without ANNA quantization (XL in our terminology).
### Latency benchmarks (Tokens Per Second - TPS)
Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
**Batch Size 1:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 130.52 | 129.87 | 128.57 | 129.25 | 44.80 |
| L40S | 101.70 | 95.65 | 89.99 | 83.39 | 44.43 |
**Batch Size 16:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 106.06 | 105.82 | 107.07 | 106.55 | 41.09 |
| L40S | 74.97 | 71.52 | 68.09 | 63.86 | 36.40 |
**Batch Size 32:**
| GPU Type | S | M | L | XL | Original |
|--------|---|---|---|----|----|
| H100 | 83.58 | 84.13 | 84.04 | 83.90 | 34.50 |
| L40S | 57.36 | 55.60 | 53.73 | 51.33 | 28.72 |
## Links
* __Platform__: [app.thestage.ai](https://app.thestage.ai)
* __Subscribe for updates__: [TheStageAI X (Twitter)](https://x.com/TheStageAI)
* __Contact email__: [email protected]