This model is not trained for function calling, right?
As context: I am thinking about multi-turn applications for which long context understanding would be super helpful.
In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.
In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.
Hello, we also provide the reference code to open YaRN when processing long documents in the updated README. For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.
{
...,
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.
Hello, we also provide the reference code to open YaRN when processing long documents in the updated README. For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.
{ ..., "rope_scaling": { "rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768 } }
So, you mean QwenLong-L1-32B doesn't natively handle the 128k context, and YaRN is needed for contexts beyond 32k? Ok, I'll try.