|
|
|
### llama 65B ggml model weight running alpaca.cpp |
|
### make 65B ggml story |
|
#### 1. clone 65B model data |
|
```shell |
|
git clone https://huggingface.co/datasets/nyanko7/LLaMA-65B/ |
|
``` |
|
#### 2. clone alpaca.cpp |
|
```shell |
|
git clone https://github.com/antimatter15/alpaca.cpp |
|
``` |
|
#### 3. weight quantize.sh |
|
```shell |
|
|
|
mv LLaMA-65B/tokenizer.model ./ |
|
python convert-pth-to-ggml.py ../LLaMA-65B/ 1 |
|
|
|
cd alpaca.cpp |
|
mkdir -p models/65B |
|
mv ../LLaMA-65B/ggml-model-f16.bin models/65B/ |
|
mv ../LLaMA-65B/ggml-model-f16.bin.* models/65B/ |
|
bash quantize.sh 65B |
|
``` |
|
|
|
#### 4. upload weight file |
|
##### Upload is slower. The upload is taking almost 2 days, I decided to curve the upload |
|
##### I using https://tmp.link/ as temp store |
|
##### I using colab and huggingface api upload |
|
|
|
### run |
|
```shell |
|
git clone https://github.com/antimatter15/ |
|
./chat -m alpaca.cpp_65b_ggml/ggml-model-q4_0.bin |
|
``` |
|
|