THUDM/GLM-4-32B-0414 · Awesome model! Can we get a version with a larger context window?

Note sure why it's missing from the model card but the README on Github does have the info: https://github.com/THUDM/GLM-4/blob/main/README.md#handling-long-context-yarn

If the total input + output token count might exceed the model's native context length (mostly 32k for the GLM-4-0414 series), it is recommended to enable YaRN to achieve better long-context modeling capabilities. For supported frameworks, you can modify the corresponding config.json. Specifically, for GLM-Z1 series models, consider enabling YaRN (Rope Scaling) when the input length exceeds 8,192 tokens.

"rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
}

For most user requests, if the input + output token count does not exceed the native context length, no modifications are needed.