benchmark test use vllm ? input/output=500/2000 ?
#6
by
chuanyizjc
- opened
now test, nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 throughtput ~1k, dont 4x improve . want to know why ?
Hi,
Sorry for the delayed response.
You will get a speed improvement always, but the 4X depends on the setting. Hardware, input and output length, batch size.