Thanks for IQ4_NL
#1
by
KeyboardMasher
- opened
Few people do this quantization. It is very quick on ARM64 CPU because it benefits from "repack" as Q4_0 does.
Good for running small LLMs on Android phone.
Few people do this quantization. It is very quick on ARM64 CPU because it benefits from "repack" as Q4_0 does.
Good for running small LLMs on Android phone.