run with "vllm serve" comes to an error: KeyError: 'layers.0.mlp.down_proj.weight.absmax'
#1
by
ly0303521
- opened
my command:
vllm serve --max-model-len=1000 --gpu-memory-utilization=0.5 unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit
KeyError:
'layers.0.mlp.down_proj.weight.absmax'
How can use this model with vllm?
run with OrpheusModel also comes the same error
model = OrpheusModel(model_name ="unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit")