--- license: apache-2.0 tags: - gptq - quantization - vllm - text-generation - transformer inference: false library_name: vllm model_creator: menlo base_model: Menlo/Jan-nano --- # Jan-nano GPTQ 4bit (vLLM-ready) This is a 4-bit GPTQ quantized version of [Menlo/Jan-nano](https://huggingface.co/Menlo/Jan-nano), optimized for fast inference with [vLLM](https://github.com/vllm-project/vllm). - **Quantization**: GPTQ (4-bit) - **Group size**: 128 - **Dtype**: float16 - **Backend**: `gptqmodel` - **Max context length**: 4096 tokens --- ## 🔧 Usage with vLLM ```bash vllm serve ./jan-nano-4b-gptqmodel-4bit \ --quantization gptq \ --dtype half \ --max-model-len 4096 ``` --- ## 📁 Files - Sharded `.safetensors` model weights - `model.safetensors.index.json` - `tokenizer.json`, `tokenizer_config.json` - `config.json`, `generation_config.json`, `quantize_config.json` (if available) --- ## 🙏 Credits - Original model by [Menlo](https://huggingface.co/Menlo) - Quantized and shared by [ramgpt](https://huggingface.co/ramgpt)