psynote123 commited on
Commit
4b0c1b5
·
verified ·
1 Parent(s): 79ee2b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -35,7 +35,8 @@ __Goals of elastic models:__
35
 
36
  > It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.
37
 
38
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/MTNdvFoKIB8sNx2Sf17ZS.png)
 
39
 
40
  ## Audio Examples
41
 
@@ -139,28 +140,27 @@ The `Original` column in latency benchmarks typically refers to the Hugging Face
139
 
140
  Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
141
 
142
- | GPU Type | S | M | L | XL (Compiled Original) | Original (HF, non-compiled) |
143
- |----------|--------|--------|--------|------------------------|-----------------------------|
144
- | H100 | 122.75 | 124.70 | 126.21 | 126.71 | 45.33 |
145
- | L40S | 96.74 | 90.90 | 86.51 | 83.31 | 44.69 |
146
 
147
- #### Performance by Batch Size
 
 
 
 
 
148
 
149
  **Batch Size 16:**
150
- | GPU Type | S Mode (TPS) | XL Mode (TPS) |
151
- |----------|--------------|---------------|
152
- | H100 | 94.21 | 97.96 |
153
- | L40S | 69.66 | 63.19 |
154
 
155
- **Batch Size 32:**
156
- | GPU Type | S Mode (TPS) | XL Mode (TPS) |
157
- |----------|--------------|---------------|
158
- | H100 | 77.15 | 76.64 |
159
- | L40S | 54.81 | 51.34 |
160
 
161
- > **Note:** Currently deployed models support only batch size = 1. Expect upcoming updates for larger batch size support.
162
 
163
- As shown in the results, smaller batch sizes typically demonstrate higher per-token performance, which is typical for inference tasks.
 
 
 
164
 
165
 
166
  ## Links
 
35
 
36
  > It\'s important to note that specific quality degradation can vary. We aim for S models to retain high perceptual quality. The "Original" in tables refers to the non-compiled Hugging Face model, while "XL" is the compiled original. S, M, L are ANNA-quantized and compiled.
37
 
38
+
39
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7kuTModQp4_5lRqR5QJ5P.png)
40
 
41
  ## Audio Examples
42
 
 
140
 
141
  Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
142
 
 
 
 
 
143
 
144
+ **Batch Size 1:**
145
+
146
+ | GPU Type | S | M | L | XL | Original |
147
+ |--------|---|---|---|----|----|
148
+ | H100 | 130.52 | 129.87 | 128.57 | 129.25 | 44.80 |
149
+ | L40S | 101.70 | 95.65 | 89.99 | 83.39 | 44.43 |
150
 
151
  **Batch Size 16:**
 
 
 
 
152
 
153
+ | GPU Type | S | M | L | XL | Original |
154
+ |--------|---|---|---|----|----|
155
+ | H100 | 106.06 | 105.82 | 107.07 | 106.55 | 41.09 |
156
+ | L40S | 74.97 | 71.52 | 68.09 | 63.86 | 36.40 |
 
157
 
158
+ **Batch Size 32:**
159
 
160
+ | GPU Type | S | M | L | XL | Original |
161
+ |--------|---|---|---|----|----|
162
+ | H100 | 83.58 | 84.13 | 84.04 | 83.90 | 34.50 |
163
+ | L40S | 57.36 | 55.60 | 53.73 | 51.33 | 28.72 |
164
 
165
 
166
  ## Links