---
base_model:
- werty1248/Qwen2.5-32B-s1.1-Ko-Native
---

### vllm

- For 24GB VRAM
  - max-model-len: <4096 (marlin_awq) - not available
  - max-model-len: 10240 (2048 + 8192) (awq)

```
vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager
```