Serving by vLLM would trigger this 1M model refer to Coder-0.6B safetensor downloading (1.5G file). and response with no more than 40k context window. Anything wrong?
Β· Sign up or log in to comment