benchmark test use vllm ? input/output=500/2000 ?

#6
by chuanyizjc - opened

image.png

now test, nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 throughtput ~1k, dont 4x improve . want to know why ?

NVIDIA org

Hi,

Sorry for the delayed response.
You will get a speed improvement always, but the 4X depends on the setting. Hardware, input and output length, batch size.

Sign up or log in to comment