IIC
/

RigoChat-7b-v2-GGUF

Text Generation

Model card Files Files and versions Community

gonzalo-santamaria-iic commited on Nov 27, 2024

Commit

ac27130

·

verified ·

1 Parent(s): 0e19e06

Update README.md

Files changed (1) hide show

README.md +39 -2

README.md CHANGED Viewed

@@ -41,9 +41,46 @@ os.environ["MODEL_DIR"] = snapshot_download(
 python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
 ```
-Yo can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
-## How to Get Started with the Model
 ## Evaluation

 python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
 ```
+Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
+To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
+```shell
+llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
+```
+where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
+```shell
+llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
+```
+and so on. Yo can do:
+```shell
+llama-quantize --help
+```
+to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
+#### Disclaimer
+The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use.
+## How to Get Started with the Model
+You can do, for example
+```shell
+llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
+```
+or
+```shell
+llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
+```
 ## Evaluation