OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview · Query about `model_max

Hello, and thank you for this fantastic model.

I have a quick question about the configuration. I noticed that config.json sets max_position_embeddings to 131072, but the tokenizer_config.json has model_max_length set to 16384.

This causes a confusing situation when serving the model with vLLM. The startup log correctly shows that the engine is using the full context:
INFO ... Using max model len 131072

However, the API server still throws a validation warning based on the tokenizer's setting for any input over 16k tokens:
Token indices sequence length is longer than the specified maximum sequence length for this model (38682 > 16384).

Is this discrepancy intentional?

Anw, thanks again for all your hard work on this releases!

OpenGVLab
/

InternVL3_5-GPT-OSS-20B-A4B-Preview

Query about `model_max_length` configuration