Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,14 @@ For more details about SwiftKV and how to use it:
|
|
13 |
* π [SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation (arXiv)](https://arxiv.org/abs/2410.03960)
|
14 |
* π [Getting started guide](https://github.com/Snowflake-Labs/vllm/tree/swiftkv/examples/swiftkv)
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Eval Metrics
|
17 |
|
18 |
For a full breakdown on evaluation metrics and performance impact please refer to our [blog](https://www.snowflake.com/engineering-blog/swiftkv-llm-compute-reduction/) and [arXiv paper]((https://arxiv.org/abs/2410.03960)) but below we've outlined some relevant evaluation metrics.
|
|
|
13 |
* π [SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation (arXiv)](https://arxiv.org/abs/2410.03960)
|
14 |
* π [Getting started guide](https://github.com/Snowflake-Labs/vllm/tree/swiftkv/examples/swiftkv)
|
15 |
|
16 |
+
## Performance Metrics
|
17 |
+
|
18 |
+
Combined input and output throughput for Llama 3.1 405B across a range of input lengths.
|
19 |
+
<img src="figure-4.png" alt="performance plot of llama-405B w. swiftkv" width="400">
|
20 |
+
Legend: blue - baseline FP8, pink - SwiftKV FP8<br>
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
## Eval Metrics
|
25 |
|
26 |
For a full breakdown on evaluation metrics and performance impact please refer to our [blog](https://www.snowflake.com/engineering-blog/swiftkv-llm-compute-reduction/) and [arXiv paper]((https://arxiv.org/abs/2410.03960)) but below we've outlined some relevant evaluation metrics.
|