Full model context len & default settings (max_position_embeddings)

#15

by LPN64 - opened May 11

May 11

Hello,

According to the config, max model len is 8k : https://huggingface.co/CohereLabs/c4ai-command-r7b-12-2024/blob/main/config.json#L18

VLLM also interprets it as it is and according to this comment in llama.cpp https://github.com/ggml-org/llama.cpp/pull/10900#discussion_r1894397776 we might need to adjust the rope settings ourself.

While in older command-r models, the max_position_embeddings setting is at the reported model max len capacity : https://huggingface.co/CohereLabs/c4ai-command-r-08-2024/blob/main/config.json#L15

What are the settings you use to run it at full size in llama.cpp and vllm ?

Thanks.

dontloo

May 12

•

edited May 12

Hi @LPN64
For running at full size for vllm we would recommend using max_position_embeddings=256000. Although theoretically the number can go as far as memory allows, we cannot guarantee the model quality for sequence lengths beyond 256 k.

Thanks.

LPN64

May 12

Thanks for the quick answer.

as of right now :

`VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve CohereLabs/c4ai-command-r7b-12-2024 --max-model-len 31000"

Crashes VLLM

I had to add --hf-overrides "{\"max_position_embeddings\": 131072}" to make it work.

I ran few tests comparing VLLM vs Llama.cpp and huggingface transformers lib, and so far HF gives the best results, will run more tests tomorrow.

LPN64

May 13

Well it just got updated https://huggingface.co/CohereLabs/c4ai-command-r7b-12-2024/commit/c3e86d9049f42adc1e1ee729286bca126e71f30e

alexrs changed discussion status to closed Jun 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment