--- library_name: transformers language: - es base_model: - IIC/RigoChat-7b-v2 pipeline_tag: text-generation license: cc-by-nc-4.0 license_link: https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/LICENSE tags: - chat --- # Model Card for RigoChat-7b-v2-GGUF ## Introduction This repo contains the [IIC/RigoChat-7b-v2](https://huggingface.co/IIC/RigoChat-7b-v2) model in the GGUF Format, with the original weights and quantized to different precisions. The [llama.cpp](https://github.com/ggerganov/llama.cpp) library has been used to transform the parameters into GGUF format, as well as to perform the quantizations. Specifically, the following command has been used to obtain the model in full precision: 1. To download the weights: ```python from huggingface_hub import snapshot_download import os model_id="IIC/RigoChat-7b-v2" os.environ["MODEL_DIR"] = snapshot_download( repo_id=model_id, local_dir="model", local_dir_use_symlinks=False, revision="main", ) ``` 2. To transform to `FP16`: ```shell python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16 ``` Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf). To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows: ```shell llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024 ``` where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do: ```shell llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M ``` and so on. Yo can do: ```shell llama-quantize --help ``` to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types). #### Disclaimer The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use. ## How to Get Started with the Model You can do, for example ```shell llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512 ``` or ```shell llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512 ``` ## Evaluation ## Citation ``` @misc {Instituto de IngenierĂ­a del Conocimiento (IIC), author = { {Instituto de IngenierĂ­a del Conocimiento} }, title = { Adapting a language model to Spanish using a bounded dataset and reduced hardware }, year = 2024, url = { https://huggingface.co/datasets/IIC/RigoChat-7b-v2 }, doi = { 10.57967/hf/2043 }, publisher = { Hugging Face } } ``` ## Model Card Contact - [contacto.iic@iic.uam.es](contacto.iic@iic.uam.es).