VLLM fails to serve

#2
by dinerburger - opened

Getting the following error:

VLLM_USE_V1=0 vllm serve /opt/models/textgen/gemma-3-27b-it-unsloth-bnb-4bit --max-model-len 32768 --port 9999

ERROR 04-03 14:27:06 [engine.py:448] KeyError: 'layers.47.mlp.down_proj.weight.absmax'

This is listen in the safe tensors index, but it's having problems resolving it for whatever reason.

Forgot to pass --quantization parameter, sorry.

dinerburger changed discussion status to closed

Sign up or log in to comment