anon8231489123/vicuna-13b-GPTQ-4bit-128g

Apr 4, 2023

This is awesome but the conversion to ggml for llama.cpp seems to be erroring out.
My command: python3 convert-gptq-to-ggml.py ../llama_models/vicuna-13b-GPTQ-4bit-128g/vicuna-13b-GPTQ-4bit-128g.pt ../llama_models/vicuna-13b-GPTQ-4bit-128g/tokenizer.model ggml-vicuna-13b-GPTQ-4bit-128g

The error:

Processing non-Q4 variable: model.embed_tokens.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Processing non-Q4 variable: model.norm.weight with shape: torch.Size([5120]) and type: torch.float16
  Converting to float32
Processing non-Q4 variable: lm_head.weight with shape: torch.Size([32001, 5120]) and type: torch.float16
Traceback (most recent call last):
  File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 156, in <module>
    convert_q4(f"model.layers.{i}.self_attn.q_proj", f"layers.{i}.attention.wq.weight", permute=True)
  File "/home/sravanth/llama.cpp/convert-gptq-to-ggml.py", line 97, in convert_q4
    zeros = model[f"{src_name}.zeros"].numpy()
KeyError: 'model.layers.0.self_attn.q_proj.zeros'

Any pointers on fixing it?

anon8231489123

Owner Apr 4, 2023

Use the safetensors, if possible.

Reggie

Apr 4, 2023

Doesn't seem to support safetensors yet. But may be coming soon: https://github.com/ggerganov/llama.cpp/issues/688
Guess I'll wait.
Thanks

Reggie

Apr 4, 2023

Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.

eachadea

Apr 6, 2023

Is this: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit
the ggml version of your repo by any chance. Doesn't say if gptq was used etc. So was kind of confused.

I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).

ai2p

Apr 8, 2023

I converted into ggml using https://huggingface.co/eachadea/vicuna-13b (merged it from delta weights on my own system).

Does GPTQ was used in that conversion?

eachadea

Apr 9, 2023

No, ggml is a separate format with its own quantization implementation - gptq is not and shouldn't be involved

anon8231489123
/

vicuna-13b-GPTQ-4bit-128g

ggml conversion error