Update README.md
Browse files
README.md
CHANGED
@@ -41,9 +41,46 @@ os.environ["MODEL_DIR"] = snapshot_download(
|
|
41 |
python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
|
42 |
```
|
43 |
|
44 |
-
|
45 |
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
## Evaluation
|
49 |
|
|
|
41 |
python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
|
42 |
```
|
43 |
|
44 |
+
Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
|
45 |
|
46 |
+
To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
|
47 |
+
|
48 |
+
```shell
|
49 |
+
llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
|
50 |
+
```
|
51 |
+
|
52 |
+
where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
|
53 |
+
|
54 |
+
```shell
|
55 |
+
llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
|
56 |
+
```
|
57 |
+
|
58 |
+
and so on. Yo can do:
|
59 |
+
|
60 |
+
```shell
|
61 |
+
llama-quantize --help
|
62 |
+
```
|
63 |
+
|
64 |
+
to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
|
65 |
+
|
66 |
+
#### Disclaimer
|
67 |
+
|
68 |
+
The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use.
|
69 |
+
|
70 |
+
|
71 |
+
## How to Get Started with the Model
|
72 |
+
|
73 |
+
You can do, for example
|
74 |
+
|
75 |
+
```shell
|
76 |
+
llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
|
77 |
+
```
|
78 |
+
|
79 |
+
or
|
80 |
+
|
81 |
+
```shell
|
82 |
+
llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
|
83 |
+
```
|
84 |
|
85 |
## Evaluation
|
86 |
|