ggml conversion error

#1
by Reggie - opened

This is awesome but the conversion to ggml for llama.cpp seems to be erroring out.
My command: python3 convert-gptq-to-ggml.py ../llama_models/vicuna-13b-GPTQ-4bit-128g/vicuna-13b-GPTQ-4bit-128g.pt ../llama_models/vicuna-13b-GPTQ-4bit-128g/tokenizer.model ggml-vicuna-13b-GPTQ-4bit-128g

The error:

Processing non-Q4 variable: model.embed_tokens.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Processing non-Q4 variable: model.norm.weight with shape: torch.Size([5120]) and type: torch.float16
  Converting to float32
Processing non-Q4 variable: lm_head.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Traceback (most recent call last):
  File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 156, in <module>
    convert_q4(f"model.layers.{i}.self_attn.q_proj", f"layers.{i}.attention.wq.weight", permute=True)
  File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 97, in convert_q4
    zeros = model[f"{src_name}.zeros"].numpy()
KeyError: 'model.layers.0.self_attn.q_proj.zeros'

Any pointers on fixing it?

Use the safetensors, if possible.

Doesn't seem to support safetensors yet. But may be coming soon: https://github.com/ggerganov/llama.cpp/issues/688
Guess I'll wait.
Thanks

Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.

Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.

I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).

I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).

Does GPTQ was used in that conversion?

No, ggml is a separate format with its own quantization implementation - gptq is not and shouldn't be involved

Sign up or log in to comment