Time to first token for GPU is wrong

#4
by yuimo - opened

is the Time to first token data wrong for GPU inference?
i think it should be 1024 / 620 = 1.65s
image.png

Sign up or log in to comment