Jan-nano GPTQ 4bit (vLLM-ready)

This is a 4-bit GPTQ quantized version of Menlo/Jan-nano, optimized for fast inference with vLLM.

  • Quantization: GPTQ (4-bit)
  • Group size: 128
  • Dtype: float16
  • Backend: gptqmodel
  • Max context length: 4096 tokens

馃敡 Usage with vLLM

vllm serve ./jan-nano-4b-gptqmodel-4bit \
  --quantization gptq \
  --dtype half \
  --max-model-len 4096

馃搧 Files

  • Sharded .safetensors model weights
  • model.safetensors.index.json
  • tokenizer.json, tokenizer_config.json
  • config.json, generation_config.json, quantize_config.json (if available)

馃檹 Credits

  • Original model by Menlo
  • Quantized and shared by ramgpt
Downloads last month
16
Safetensors
Model size
876M params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ramgpt/jan-nano-4b-gptqmodel-4bit

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
Menlo/Jan-nano
Quantized
(19)
this model