Int4为什么比没量化的float32和float16还慢
3
#3 opened 3 months ago
by
hujianmin
Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 模型加载时间过长(近 2 小时)
#2 opened 11 months ago
by
TimVan1

Not working with sample code
3
#1 opened about 1 year ago
by
rupeshs