is the Time to first token data wrong for GPU inference?i think it should be 1024 / 620 = 1.65s
سلام
· Sign up or log in to comment