dinerburger
/

Qwen2.5-Coder-32B-Instruct-exl2-4.125b_L

Text Generation

Model card Files Files and versions Community

dinerburger commited on Mar 26

Commit

270a50c

·

verified ·

1 Parent(s): 0051aa1

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -28,7 +28,8 @@ By default, this model caps out at 32K context. Additional configuration is requ
     "factor": 4.0,
     "original_max_position_embeddings": 32768,
     "type": "yarn"
-}```
 Once this is done, you can push the model to 64K context at Q4 KV cache quantization on a single 24GB VRAM card with minimal loss of accuracy.

     "factor": 4.0,
     "original_max_position_embeddings": 32768,
     "type": "yarn"
+}
+```
 Once this is done, you can push the model to 64K context at Q4 KV cache quantization on a single 24GB VRAM card with minimal loss of accuracy.