QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens
Downloads last month
22
Safetensors
Model size
17.7B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(115)
this model