How to use quantized version?

by jawad1347 - opened Jun 22, 2024

Discussion

jawad1347

Jun 22, 2024

Kindly write code to use it in colab loading it with 4bit quants. Thanks

yliu279

Salesforce org Jun 26, 2024

•

edited Jun 26, 2024

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

prudant

Jun 28, 2024

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

yliu279

Salesforce org Jul 1, 2024

•

edited Jul 1, 2024

Hi @prudant ,

Of course, it can be used by other quantization methods.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment