πŸ›‘ Note: not every quant is displayed on the table on the right, you can find everything here.

Using llama.cpp release b3804 for quantization.

Original model: https://huggingface.co/ifable/gemma-2-Ifable-9B

All quants were made using the imatrix option (except BF16, that's the original precision). The imatrix was generated with the dataset from here, using the BF16 GGUF with a context size of 8192 tokens (default is 512 but higher/same as model context size should improve quality) and 13 chunks.

How to make your own quants:

https://github.com/ggerganov/llama.cpp/tree/master/examples/imatrix

https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize

Downloads last month
228
GGUF
Model size
9.24B params
Architecture
gemma2

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Hampetiudo/gemma-2-Ifable-9B-i1-GGUF

Quantized
(10)
this model