Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq

Error running with Sagemaker Jumpstart + Text Generation Inference : weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist

#5
by bikalnetomi - opened

I am trying to run the model using the Text Generation Inference. Getting the following error.

I used the Sagemaker Jumpstart code
Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'TheBloke/OpenOrca-Platypus2-13B-GPTQ',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.8xlarge",
container_startup_health_check_timeout=300,
)

Any help to resolve the following error.

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/init.py", line 185, in get_model
return FlashLlama(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 65, in init
model = FlashLlamaForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 452, in init
self.model = FlashLlamaModel(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 390, in init
[
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 391, in
FlashLlamaLayer(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 326, in init
self.self_attn = FlashLlamaAttention(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 183, in init
self.rotary_emb = PositionRotaryEmbedding.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 395, in load
inv_freq = weights.get_tensor(f"{prefix}.inv_freq")
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 62, in get_tensor
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist

were you able to resolve this?

Sign up or log in to comment