Context length: is it 128K (as mentioned in the model card) or 160K (as specified in config.json)?
#17
by
Lissanro
- opened
I noticed https://huggingface.co/deepseek-ai/DeepSeek-V3.1/blob/main/config.json mentions 160K context length rather 128K. Not sure if this is a mistake or maybe 128K limit mentioned in the model card is for input tokens with additional 32K on top reserved for output?
R1 0528 had 160K context as well, so I am curious if it was really reduced for V3.1 or is it still the same?
if you look at tokenizer_config.json, it actually says model_max_length 128K