Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,7 @@
|
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
- ARMZyany/Cascade0-159M-Instruct-45k
|
5 |
-
---
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
- ARMZyany/Cascade0-159M-Instruct-45k
|
5 |
+
---
|
6 |
+
F16 is the best option. F32 is just too slow (on a RTX3060M 6GB)
|
7 |
+
Q8_O is faster than F16 but does produce sometimes better results than F16
|
8 |
+
Under Q8 might be a big tradeoff in quality. Q3 showed some very, very bad hallucinations
|