QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

Quantization: 4-bit using Bits and Bytes (bnb)
Base Model: Qwen/QwQ-32B-Preview
Parameters: 32.5 billion
Context Length: Up to 32,768 tokens

kurcontko
/

QwQ-32B-Preview-bnb-4bit

QwQ-32B-Preview-bnb-4bit

Introduction

Model Details

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit