---
license: apache-2.0
tags:
  - gptq
  - quantization
  - vllm
  - text-generation
  - transformer
inference: false
library_name: vllm
model_creator: menlo
base_model: Menlo/Jan-nano
---

# Jan-nano GPTQ 4bit (vLLM-ready)

This is a 4-bit GPTQ quantized version of [Menlo/Jan-nano](https://huggingface.co/Menlo/Jan-nano), optimized for fast inference with [vLLM](https://github.com/vllm-project/vllm).

- **Quantization**: GPTQ (4-bit)
- **Group size**: 128
- **Dtype**: float16
- **Backend**: `gptqmodel`
- **Max context length**: 4096 tokens

---

## 🔧 Usage with vLLM

```bash
vllm serve ./jan-nano-4b-gptqmodel-4bit \
  --quantization gptq \
  --dtype half \
  --max-model-len 4096
```

---

## 📁 Files

- Sharded `.safetensors` model weights
- `model.safetensors.index.json`
- `tokenizer.json`, `tokenizer_config.json`
- `config.json`, `generation_config.json`, `quantize_config.json` (if available)

---

## 🙏 Credits

- Original model by [Menlo](https://huggingface.co/Menlo)
- Quantized and shared by [ramgpt](https://huggingface.co/ramgpt)