ZeroWw commited on
Commit
b619c9e
·
verified ·
1 Parent(s): ede6619

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -14,6 +14,12 @@ This creates models that are little or not degraded at all and have a smaller si
14
  They run at about 3-6 t/sec on CPU only using llama.cpp
15
  And obviously faster on computers with potent GPUs
16
 
 
 
 
 
 
 
17
  * [ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF](https://huggingface.co/ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF)
18
  * [ZeroWw/Pythia-Chat-Base-7B-GGUF](https://huggingface.co/ZeroWw/Pythia-Chat-Base-7B-GGUF)
19
  * [ZeroWw/Yi-1.5-6B-Chat-GGUF](https://huggingface.co/ZeroWw/Yi-1.5-6B-Chat-GGUF)
 
14
  They run at about 3-6 t/sec on CPU only using llama.cpp
15
  And obviously faster on computers with potent GPUs
16
 
17
+ ALL the models were quantized in this way:
18
+ quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
19
+ quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
20
+ quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
21
+ and there is also a pure f16 in every directory.
22
+
23
  * [ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF](https://huggingface.co/ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF)
24
  * [ZeroWw/Pythia-Chat-Base-7B-GGUF](https://huggingface.co/ZeroWw/Pythia-Chat-Base-7B-GGUF)
25
  * [ZeroWw/Yi-1.5-6B-Chat-GGUF](https://huggingface.co/ZeroWw/Yi-1.5-6B-Chat-GGUF)