iq2_bn with 4 or 5 weights per byte?

#2
by TobDeBer - opened

Does iq2_bn encode 4 or 5 weights into each byte? I think it is 4.

If so then this mode would be more efficient:
// 1.6875 bpw
//typedef struct {
// uint8_t qs[(QK_K - 4 * QK_K / 64) / 5]; // 5 elements per byte (3^5 = 243 < 256)
// uint8_t qh[QK_K/64]; // 4 elements per byte
// ggml_half d;
//} block_tq1_0;

Sign up or log in to comment