Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,12 @@ This creates models that are little or not degraded at all and have a smaller si
|
|
14 |
They run at about 3-6 t/sec on CPU only using llama.cpp
|
15 |
And obviously faster on computers with potent GPUs
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
* [ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF](https://huggingface.co/ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF)
|
18 |
* [ZeroWw/Pythia-Chat-Base-7B-GGUF](https://huggingface.co/ZeroWw/Pythia-Chat-Base-7B-GGUF)
|
19 |
* [ZeroWw/Yi-1.5-6B-Chat-GGUF](https://huggingface.co/ZeroWw/Yi-1.5-6B-Chat-GGUF)
|
|
|
14 |
They run at about 3-6 t/sec on CPU only using llama.cpp
|
15 |
And obviously faster on computers with potent GPUs
|
16 |
|
17 |
+
ALL the models were quantized in this way:
|
18 |
+
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
|
19 |
+
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
|
20 |
+
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
|
21 |
+
and there is also a pure f16 in every directory.
|
22 |
+
|
23 |
* [ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF](https://huggingface.co/ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF)
|
24 |
* [ZeroWw/Pythia-Chat-Base-7B-GGUF](https://huggingface.co/ZeroWw/Pythia-Chat-Base-7B-GGUF)
|
25 |
* [ZeroWw/Yi-1.5-6B-Chat-GGUF](https://huggingface.co/ZeroWw/Yi-1.5-6B-Chat-GGUF)
|