is this per-channel int4 quantized?

by anemll - opened 27 days ago

27 days ago

https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
tech report says

" Based on the most popular open source quantization inference engines (e.g. llama.cpp), we focus on three weight representations: per-channel int4, per-block int4, and switched fp8. In Table 3, we report the memory filled by raw...."

Does it mean this is "per-channel int4" described? Q4_0 is clearly block-32 per
this:
https://github.com/ggml-org/llama.cpp/wiki/Tensor-Encoding-Schemes

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment