could you give me a reason why you ignore kv_a_proj_with_mqa layer when quantizing this model?
1
#10 opened 11 days ago
by
superahn
Frequent interruptions during reasoning with vllm 0.8.1
#9 opened about 1 month ago
by
alwinzhang
Stuck when run on 8xH100
1
#8 opened about 2 months ago
by
Thai
Accuracy test
#1 opened 2 months ago
by
zhnagchenchne