--- base_model: - werty1248/Qwen2.5-32B-s1.1-Ko-Native --- ### vllm - For 24GB VRAM - max-model-len: <4096 (marlin_awq) - not available - max-model-len: 10240 (2048 + 8192) (awq) ``` vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager ```