--- license: apache-2.0 base_model: - facebook/musicgen-large base_model_relation: quantized pipeline_tag: text-to-audio language: - en # tags: - text-to-audio - music-generation - pytorch - annthem - qlip - thestage --- # Elastic model: MusicGen Large. Fastest and most flexible models for self-serving. # Attention: this page is for informational purposes only, to use the models you need to wait for the update of elastic_models package! Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models: * __XL__: Mathematically equivalent neural network (original compiled `facebook/musicgen-large`), optimized with our DNN compiler. * __L__: Near lossless model, with minimal degradation obtained on corresponding audio quality benchmarks. * __M__: Faster model, with minor and acceptable accuracy degradation. * __S__: The fastest model, with slight accuracy degradation, offering the best speed. * __Original__: The original `facebook/musicgen-large` model from Hugging Face, without QLIP compilation. __Goals of elastic models:__ * Provide flexibility in cost vs quality selection for inference * Provide clear quality and latency benchmarks for audio generation * Provide interface of HF libraries: `transformers` and `elastic_models` with a single line of code change for using optimized versions. * Provide models supported on a wide range of hardware (NVIDIA GPUs), which are pre-compiled and require no JIT. * Provide the best models and service for self-hosting. > It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7kuTModQp4_5lRqR5QJ5P.png) ## Audio Examples Below are a few examples demonstrating the audio quality of the different Elastic MusicGen Large versions. These samples were generated on an NVIDIA H100 GPU with a duration of 20 seconds each. For a more comprehensive set of examples and interactive demos, please visit [musicgen.thestage.ai](http://music.thestage.ai). **Prompt:** "Calm lofi hip hop track with a simple piano melody and soft drums" (Audio: 20 seconds, H100 GPU) | S | M | L | XL (Compiled Original) | Original (HF Non-Compiled) | |----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| | | | | | | ----- ## Inference To infer our MusicGen models, you primarily use the `elastic_models.transformers.MusicgenForConditionalGeneration` class. If you have compiled engines, you provide the path to them. Otherwise, for non-compiled or original models, you can use the standard Hugging Face `transformers.MusicgenForConditionalGeneration`. **Example using `elastic_models` with a compiled model:** ```python import torch import scipy.io.wavfile from transformers import AutoProcessor from elastic_models.transformers import MusicgenForConditionalGeneration model_name_hf = "facebook/musicgen-large" elastic_mode = "S" prompt = "A groovy funk bassline with a tight drum beat" output_wav_path = "generated_audio_elastic_S.wav" hf_token = "YOUR_TOKEN" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") processor = AutoProcessor.from_pretrained(model_name_hf, token=hf_token) model = MusicgenForConditionalGeneration.from_pretrained( model_name_hf, token=hf_token, torch_dtype=torch.float16, mode=elastic_mode, device=device, ).to(device) model.eval() inputs = processor( text=[prompt], padding=True, return_tensors="pt", ).to(device) print(f"Generating audio for: {prompt}...") generate_kwargs = {"do_sample": True, "guidance_scale": 3.0, "max_new_tokens": 256, "cache_implementation": "paged"} audio_values = model.generate(**inputs, **generate_kwargs) audio_values_np = audio_values.to(torch.float32).cpu().numpy().squeeze() sampling_rate = model.config.audio_encoder.sampling_rate scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=audio_values_np) print(f"Audio saved to {output_wav_path}") ``` __System requirements:__ * GPUs: NVIDIA H100, L40S. * CPU: AMD, Intel * Python: 3.8-3.11 (check dependencies for specific versions) To work with our elastic models and compilation tools, you\'ll need to install `elastic_models` and `qlip` libraries from TheStage: ```shell pip install thestage pip install elastic_models[nvidia]\ --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\ --extra-index-url https://pypi.nvidia.com\ --extra-index-url https://pypi.org/simple pip install flash-attn==2.7.3 --no-build-isolation pip uninstall apex ``` Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows: ```shell thestage config set --api-token ``` Congrats, now you can use accelerated models and tools! ---- ## Benchmarks Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for MusicGen models using our algorithms. The `Original` column in latency benchmarks typically refers to the Hugging Face `facebook/musicgen-large` model compiled without ANNA quantization (XL in our terminology). ### Latency benchmarks (Tokens Per Second - TPS) Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)). **Batch Size 1:** | GPU Type | S | M | L | XL | Original | |--------|---|---|---|----|----| | H100 | 130.52 | 129.87 | 128.57 | 129.25 | 44.80 | | L40S | 101.70 | 95.65 | 89.99 | 83.39 | 44.43 | **Batch Size 16:** | GPU Type | S | M | L | XL | Original | |--------|---|---|---|----|----| | H100 | 106.06 | 105.82 | 107.07 | 106.55 | 41.09 | | L40S | 74.97 | 71.52 | 68.09 | 63.86 | 36.40 | **Batch Size 32:** | GPU Type | S | M | L | XL | Original | |--------|---|---|---|----|----| | H100 | 83.58 | 84.13 | 84.04 | 83.90 | 34.50 | | L40S | 57.36 | 55.60 | 53.73 | 51.33 | 28.72 | ## Links * __Platform__: [app.thestage.ai](https://app.thestage.ai) * __Subscribe for updates__: [TheStageAI X (Twitter)](https://x.com/TheStageAI) * __Contact email__: contact@thestage.ai