F16 is the best option. F32 is just too slow (on a RTX3060M 6GB) Q8_O is faster than F16 but does produce sometimes better results than F16 Under Q8 might be a big tradeoff in quality. Q3 showed some very, very bad hallucinations
- Downloads last month
- 39
Hardware compatibility
Log In
to view the estimation
8-bit
16-bit
32-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ARMZyany/Cascade0-159M-Instruct-45k-GGUF
Base model
ARMZyany/Cascade0-159M-Instruct-45k