IIC
/

Text Generation
Transformers
GGUF
Spanish
chat
conversational
gonzalo-santamaria-iic commited on
Commit
ac27130
verified
1 Parent(s): 0e19e06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -2
README.md CHANGED
@@ -41,9 +41,46 @@ os.environ["MODEL_DIR"] = snapshot_download(
41
  python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
42
  ```
43
 
44
- Yo can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
45
 
46
- ## How to Get Started with the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ## Evaluation
49
 
 
41
  python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
42
  ```
43
 
44
+ Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
45
 
46
+ To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
47
+
48
+ ```shell
49
+ llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
50
+ ```
51
+
52
+ where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
53
+
54
+ ```shell
55
+ llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
56
+ ```
57
+
58
+ and so on. Yo can do:
59
+
60
+ ```shell
61
+ llama-quantize --help
62
+ ```
63
+
64
+ to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
65
+
66
+ #### Disclaimer
67
+
68
+ The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use.
69
+
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ You can do, for example
74
+
75
+ ```shell
76
+ llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
77
+ ```
78
+
79
+ or
80
+
81
+ ```shell
82
+ llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
83
+ ```
84
 
85
  ## Evaluation
86