This model is not trained for function calling, right?

by Anaudia - opened May 26

Discussion

Anaudia

May 26

As context: I am thinking about multi-turn applications for which long context understanding would be super helpful.

llmtnbl

May 28

•

edited May 28

In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.

Wanfq

Tongyi-Zhiwen org about 1 month ago

In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.

Hello, we also provide the reference code to open YaRN when processing long documents in the updated README. For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.

{
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": 32768
    }
}

llmtnbl

about 1 month ago

In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.

Hello, we also provide the reference code to open YaRN when processing long documents in the updated README. For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.
{
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": 32768
    }
}
So, you mean QwenLong-L1-32B doesn't natively handle the 128k context, and YaRN is needed for contexts beyond 32k? Ok, I'll try.

TimingWu

13 days ago

In practice, performance on long texts is not as good as that of QwQ-32b open YaRN.

Hello, we also provide the reference code to open YaRN when processing long documents in the updated README. For input where the total length (including both input and output) significantly exceeds 32,768 tokens, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.
{
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": 32768
    }
}

find that max_position_embeddings has already been 128k, is that means we don't need to add rope_scaling for L1?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment