ValueError Rope Scaling

#22
by clawvyrin - opened

Hi , when i try to deploy the model via a HF Inference Endpoint i get this : [Server Message] Endpoint Failed to Start.

With the following details :

Exit code: 1. Reason: 1.0, β”‚ β”‚\nβ”‚ β”‚ β”‚ 1.0, β”‚ β”‚\nβ”‚ β”‚ β”‚ 1.0, β”‚ β”‚\nβ”‚ β”‚ β”‚ 1.0 β”‚ β”‚\nβ”‚ β”‚ β”‚ ], β”‚ β”‚\nβ”‚ β”‚ β”‚ "type": "longrope" β”‚ β”‚\nβ”‚ β”‚ }, β”‚ β”‚\nβ”‚ β”‚ "rope_theta": 10000.0, β”‚ β”‚\nβ”‚ β”‚ "transformers_version": "4.48.3", β”‚ β”‚\nβ”‚ β”‚ "use_cache": true, β”‚ β”‚\nβ”‚ β”‚ "vocab_size": 200064 β”‚ β”‚\nβ”‚ β”‚ } β”‚ β”‚\nβ”‚ ╰──────────────────────────────────────────────────────────────────────────╯ β”‚\n╰──────────────────────────────────────────────────────────────────────────────╯\nValueError: rope_scaling's short_factor field must have length 64, got 48"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2025-03-17T11:18:59.041473Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2025-03-17T11:18:59.041521Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart

I'm pretty new to using HF Enpoints, so I just wanted to know if there was a way I could fix it myself, or if I needed to wait for a model update or something like that.

same thing here too `---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in <cell line: 0>()
----> 1 generate(PHI3, messages)

5 frames
/usr/local/lib/python3.11/dist-packages/transformers/models/phi3/configuration_phi3.py in _rope_scaling_validation(self)
206 )
207 if not len(rope_scaling_short_factor) == self.hidden_size // self.num_attention_heads // 2:
--> 208 raise ValueError(
209 f"rope_scaling's short_factor field must have length {self.hidden_size // self.num_attention_heads // 2}, got {len(rope_scaling_short_factor)}"
210 )

ValueError: rope_scaling's short_factor field must have length 64, got 48`

Microsoft org

Hi @clawvyrin and @3mar2000 ,

Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

Can you upgrade your HF version, or path the changes below and try?

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

Hi @ykim362 , I'm not using Transformers but InferenceClient and it still doesn't work, unfortunately.
I am trying to deploy the model directly from this page: https://huggingface.co/microsoft/Phi-4-mini-instruct

If you want to fix it yourself, ensure that the partial rotary factor is added to your config (here explicitly as 0.75): "rotary_ndims = int(self.hidden_size // self.num_attention_heads * 0.75)
if not len(rope_scaling_short_factor) == rotary_ndims // 2:
raise ValueError(
f"rope_scaling's short_factor field must have length {rotary_ndims // 2}, got {len(rope_scaling_short_factor)}"
)"

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment