How to use quantized version?
#2
by
jawad1347
- opened
Kindly write code to use it in colab loading it with 4bit quants. Thanks
Sure, loading it with 4bit quant can be used by BitsAndBytes
from transformers import BitsAndBytesConfig
# Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModel.from_pretrained(
'Salesforce/SFR-Embedding-2_R',
device_map='auto',
trust_remote_code=True,
quantization_config=quantization_config,
**model_kwargs,
)
can be quantized to gptq o awq? or those format will not be compatible with this arch ?
Hi @prudant ,
Of course, it can be used by other quantization methods.