Model Information

Quantized version of ibm-granite/granite-3.2-2b-instruct using torch.float32 for quantization tuning.

  • 4 bits (INT4)
  • group size = 128
  • Asymmetrical Quantization
  • Method AutoAWQ format

Quantization framework: Intel AutoRound v0.4.7

Note: this INT4 version of granite-3.2-2b-instruct has been quantized to run inference through CPU.

Replication Recipe

Step 1 Install Requirements

I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.

wget https://github.com/intel/auto-round/archive/refs/tags/v0.4.7.tar.gz
tar -xvzf v0.4.7.tar.gz
cd auto-round-0.4.7
pip install -r requirements-cpu.txt --upgrade

Step 2 Build Intel AutoRound wheel from sources

pip install -vvv --no-build-isolation -e .[cpu]

Step 3 Script for Quantization

  from transformers import AutoModelForCausalLM, AutoTokenizer
  model_name = "ibm-granite/granite-3.2-2b-instruct"
  model = AutoModelForCausalLM.from_pretrained(model_name)
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  from auto_round import AutoRound
  bits, group_size, sym, device = 4, 128, False, 'cpu'
  autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym, device=device)
  autoround.quantize()
  output_dir = "./AutoRound/ibm-granite_granite-3.2-2b-instruct-autoawq-int4-gs128-asym"
  autoround.save_quantized(output_dir, format='auto_awq', inplace=True)

License

Apache 2.0 License

Disclaimer

This quantized model comes with no warrenty. It has been developed only for research purposes.

Downloads last month
29
Safetensors
Model size
426M params
Tensor type
F32
·
I32
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fbaldassarri/ibm-granite_granite-3.2-2b-instruct-autoawq-int4-gs128-asym

Collection including fbaldassarri/ibm-granite_granite-3.2-2b-instruct-autoawq-int4-gs128-asym