can support ignore layers in w8a8_int8 quantization setting?
#12 opened 14 days ago
by
jgfly
Can I run this model on AMD GPU? Or is it only compatible for Nvidia GPU?
#11 opened about 1 month ago
by
luciagan
Update inference/bf16_cast_channel_int8.py
#10 opened 2 months ago
by
HandH1998
Update config.json
#9 opened 2 months ago
by
HandH1998
how to achieve 2500 tps throughput?
#8 opened 2 months ago
by
muziyongshixin
can this model run with `ollama` with `pure cpu` model?
#7 opened 2 months ago
by
ice6
Add `quantization_config` in config.json?
4
#4 opened 2 months ago
by
WeiwenXia
运行channel INT8后sglang报错OOM
1
#3 opened 2 months ago
by
zhangneilc