Converting to ggml and quantizing with llama.cpp
After applying the XOR deltas, I tried to convert the weights to GGML format using the latest llama.cpp (0e018fe008eacebdbcfa2d61b6c988c245c961cd
) convert.py
using this command:
python3 convert.py --outfile ~/models/oasst-sft-6-llama-30B-float.bin ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/
which resulted in the following error:
Loading vocab file ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/tokenizer.model
Traceback (most recent call last):
File "~/models/llama.cpp/convert.py", line 1149, in <module>
main()
File "~/models/llama.cpp/convert.py", line 1144, in main
OutputFile.write_all(outfile, params, model, vocab)
File "~/models/llama.cpp/convert.py", line 942, in write_all
check_vocab_size(params, vocab)
File "~/models/llama.cpp/convert.py", line 896, in check_vocab_size
raise Exception(msg)
Exception: Vocab size mismatch (model has 32016, but ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/tokenizer.model combined with ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/added_tokens.json has 32005).
I updated ~/models/oasst-sft-6-llama-30b-xor/oasst-sft-6-llama-30b/added_tokens.json
to add the tokens:
{
"<|assistant|>": 32004,
"<|prefix_begin|>": 32000,
"<|prefix_end|>": 32003,
"<|prompter|>": 32002,
"<|system|>": 32001,
"<|babychonk|>": 32015,
"<|superchonk|>": 32014,
"<|megachonk|>": 32013,
"<|ohlawdhecomin|>": 32012,
"<|baby_chonk|>": 32011,
"<|super_chonk|>": 32010,
"<|mega_chonk|>": 32009,
"<|oh_lawd_he_comin|>": 32008,
"<|BABYCHONK|>": 32007,
"<|SUPERCHONK|>": 32006,
"<|MEGACHONK|>": 32005
}
This fixed the error and conversion worked.
Quantized successfully using this command:
./llama.cpp/quantize ~/models/oasst-sft-6-llama-30b-float.bin ~/models/ggml-oasst-sft-6-llama-30b-q4_0.bin 2
Then, running command:
./llama.cpp/main -m ggml-oasst-sft-6-llama-30b-q4_0.bin -p "<|prompter|>: Suppose I have a cabbage, a goat and a lion, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the cabbage and lion alone together, and I am not allowed to leave the lion and goat alone together. How can I safely get all three across? <|assistant|>: " -n 1000
That's great news!! Can you post the ggml weights on HF, pretty please? :)
Share the weights please!
So about 2 minutes total response time on cpu for this prompt with ggml? Is this an M1/M2 chip or intel/AMD?
I'm seeing decently fast response on an M2 pro with 32gb ram so it's not bad.
where i can get the bin file of the ~/models/oasst-sft-6-llama-30b-float.bin
model?