is this per-channel int4 quantized?
#1
by
anemll
- opened
https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
tech report says
" Based on the most popular open source quantization inference engines (e.g. llama.cpp), we focus on three weight representations: per-channel int4, per-block int4, and switched fp8. In Table 3, we report the memory filled by raw...."
Does it mean this is "per-channel int4" described? Q4_0 is clearly block-32 per
this:
https://github.com/ggml-org/llama.cpp/wiki/Tensor-Encoding-Schemes