This is the inceptionai/jais-family-13b-chat model converted to OpenVINO with INT4 symmetric channel-wise weight compression.

Download the model

  • Install huggingface-hub
pip install huggingface-hub[cli]
  • Download the model
huggingface-cli download helenai/jais-family-13b-chat-ov-int4-sym --local-dir jais-family-13b-chat-ov-int4-sym

Run inference

The recommend way to run inference with this model is with OpenVINO GenAI. It is the only package needed for inference - no need to install Transformers or PyTorch.

  • Install OpenVINO GenAI nightly
pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
  • Download a chat sample script (curl -O works on Windows Command Prompt and most Linux terminals)
curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_chat.py
  • Run the chat script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model
python llm_chat.py jais-family-13b-chat-ov-int4-sym GPU

More information

Check out OpenVINO GenAI documentation and OpenVINO GenAI samples.

Model compression parameters

openvino_version         : 2025.2.0-18660-3ceeeb52d64

advanced_parameters      : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}}
all_layers               : False
awq                      : False
backup_mode              : int8_asym
gptq                     : False
group_size               : -1
ignored_scope            : []
lora_correction          : False
mode                     : int4_sym
ratio                    : 1.0
scale_estimation         : False
sensitivity_metric       : weight_quantization_error

optimum_intel_version    : 1.22.0
optimum_version          : 1.24.0
pytorch_version          : 2.5.1+cpu
transformers_version     : 4.48.3
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support