Awesome model! Can we get a version with a larger context window?

#15
by seall0 - opened

Great model! I use it to make detailed minutes based on transcripts of really long meetings in portuguese and this absolutely blows mixtral and everything else out of the water. It leaves absolutely no detail behind. It gets a bit too "creative" sometimes but I reduced the temperature and it's better now. Awesome model!

The context window could be higher for my use case though, 64k would be perfect. Can we get that?

Note sure why it's missing from the model card but the README on Github does have the info: https://github.com/THUDM/GLM-4/blob/main/README.md#handling-long-context-yarn

If the total input + output token count might exceed the model's native context length (mostly 32k for the GLM-4-0414 series), it is recommended to enable YaRN to achieve better long-context modeling capabilities. For supported frameworks, you can modify the corresponding config.json. Specifically, for GLM-Z1 series models, consider enabling YaRN (Rope Scaling) when the input length exceeds 8,192 tokens.

"rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
}

For most user requests, if the input + output token count does not exceed the native context length, no modifications are needed.

Sign up or log in to comment