TheStageAI
/

Elastic-musicgen-large

music-generation

Model card Files Files and versions Community

psynote123 commited on 1 day ago

Commit

4b0c1b5

·

verified ·

1 Parent(s): 79ee2b7

Update README.md

Files changed (1) hide show

README.md +17 -17

README.md CHANGED Viewed

@@ -35,7 +35,8 @@ __Goals of elastic models:__
 > It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/MTNdvFoKIB8sNx2Sf17ZS.png)
 ## Audio Examples
@@ -139,28 +140,27 @@ The `Original` column in latency benchmarks typically refers to the Hugging Face
 Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
-| GPU Type | S      | M      | L      | XL (Compiled Original) | Original (HF, non-compiled) |
-|----------|--------|--------|--------|------------------------|-----------------------------|
-| H100     | 122.75 | 124.70 | 126.21 | 126.71                 | 45.33       |
-| L40S     | 96.74  | 90.90  | 86.51  | 83.31                  | 44.69      |
-#### Performance by Batch Size
 **Batch Size 16:**
-| GPU Type | S Mode (TPS) | XL Mode (TPS) |
-|----------|--------------|---------------|
-| H100     | 94.21        | 97.96         |
-| L40S     | 69.66        | 63.19         |
-**Batch Size 32:**
-| GPU Type | S Mode (TPS) | XL Mode (TPS) |
-|----------|--------------|---------------|
-| H100     | 77.15        | 76.64         |
-| L40S     | 54.81        | 51.34         |
-> **Note:** Currently deployed models support only batch size = 1. Expect upcoming updates for larger batch size support.
-As shown in the results, smaller batch sizes typically demonstrate higher per-token performance, which is typical for inference tasks.
 ## Links

 > It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7kuTModQp4_5lRqR5QJ5P.png)
 ## Audio Examples
 Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
+**Batch Size 1:**
+| GPU Type | S | M | L | XL | Original |
+|--------|---|---|---|----|----|
+| H100 | 130.52 | 129.87 | 128.57 | 129.25 | 44.80 |
+| L40S | 101.70 | 95.65 | 89.99 | 83.39 | 44.43 |
 **Batch Size 16:**
+| GPU Type | S | M | L | XL | Original |
+|--------|---|---|---|----|----|
+| H100 | 106.06 | 105.82 | 107.07 | 106.55 | 41.09 |
+| L40S | 74.97 | 71.52 | 68.09 | 63.86 | 36.40 |
+**Batch Size 32:**
+| GPU Type | S | M | L | XL | Original |
+|--------|---|---|---|----|----|
+| H100 | 83.58 | 84.13 | 84.04 | 83.90 | 34.50 |
+| L40S | 57.36 | 55.60 | 53.73 | 51.33 | 28.72 |
 ## Links