Update README.md
Browse files
README.md
CHANGED
|
@@ -295,10 +295,11 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
|
|
| 295 |
| | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8 |
|
| 296 |
| latency (batch_size=1) | 1.61s | 1.25s (1.29x speedup) |
|
| 297 |
| latency (batch_size=256) | 5.16s | 4.89s (1.05x speedup) |
|
| 298 |
-
| serving (num_prompts=1) | 1.37 req/s | 1.
|
| 299 |
-
| serving (num_prompts=1000) |
|
| 300 |
|
| 301 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
|
|
|
| 302 |
|
| 303 |
<details>
|
| 304 |
<summary> Reproduce Model Performance Results </summary>
|
|
|
|
| 295 |
| | Phi-4 mini-Ins | Phi-4-mini-instruct-FP8 |
|
| 296 |
| latency (batch_size=1) | 1.61s | 1.25s (1.29x speedup) |
|
| 297 |
| latency (batch_size=256) | 5.16s | 4.89s (1.05x speedup) |
|
| 298 |
+
| serving (num_prompts=1) | 1.37 req/s | 1.66 req/s (1.21x speedup) |
|
| 299 |
+
| serving (num_prompts=1000) | 62.55 req/s | 72.56 req/s (1.16x speedup) |
|
| 300 |
|
| 301 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 302 |
+
Note the result is not using fbgemm kernels, (no `fbgemm-gpu-genai` installed), fbgemm kernels has less speedup when num_prompts is 1000 currently.
|
| 303 |
|
| 304 |
<details>
|
| 305 |
<summary> Reproduce Model Performance Results </summary>
|