unsloth
/

Kimi-K2-Instruct-GGUF

Text Generation

Model card Files Files and versions

shimmyshimmer commited on Jul 14

Commit

73a9be5

·

verified ·

1 Parent(s): 522969f

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -27,13 +27,16 @@ library_name: transformers
   </div>
 <h1 style="margin-top: 0rem;">🌙 Kimi K2 Usage Guidelines</h1>
 </div>
 It is recommended to have at least 128GB unified RAM memory to run the small quants. With 16GB VRAM and 256 RAM, expect 5+ tokens/sec.
 For best results, use any 2-bit XL quant or above.
 Set the temperature to 0.6 recommended) to reduce repetition and incoherence.
-- Use llama.cpp's [PR #14654](https://github.com/ggml-org/llama.cpp/pull/14654) or [our llama.cpp fork](https://github.com/unslothai/llama.cpp) (easier to work)
-- For complete detailed instructions, see our guide: [docs.unsloth.ai/basics/kimi-k2](https://docs.unsloth.ai/basics/kimi-k2)
 <div align="center">
   <picture>

   </div>
 <h1 style="margin-top: 0rem;">🌙 Kimi K2 Usage Guidelines</h1>
 </div>
+- To run, you must use llama.cpp [PR #14654](https://github.com/ggml-org/llama.cpp/pull/14654) or [our llama.cpp fork](https://github.com/unslothai/llama.cpp) (easier)
+- For complete detailed instructions, see our guide: [docs.unsloth.ai/basics/kimi-k2](https://docs.unsloth.ai/basics/kimi-k2)
 It is recommended to have at least 128GB unified RAM memory to run the small quants. With 16GB VRAM and 256 RAM, expect 5+ tokens/sec.
 For best results, use any 2-bit XL quant or above.
 Set the temperature to 0.6 recommended) to reduce repetition and incoherence.
+---
 <div align="center">
   <picture>