Why is max_position_embeddings set to 40K in config.json but original_max_position_embeddings is 32K in the README? Which one should be used?
#17
by
mmyin
- opened
In config.json, max_position_embeddings
is set to 40K, while in the README under the yarn configuration, original_max_position_embeddings
is set to 32K. Which configuration should be used?
If it is 40K, does the yarn extension mean it would expand to 160K?
"如果未指定 --max-model-len,config.json 中的默认 max_position_embeddings 被设置为 40,960,vLLM 将使用该值。此分配包括为输出保留 32,768 个 token,为典型提示保留 8,192 个 token,这足以应对大多数涉及短文本处理的场景,并为模型思考留出充足空间。如果平均上下文长度不超过 32,768 个 token,我们不建议在此场景中启用 YaRN,因为这可能会降低模型性能。"
from https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html