Transformers
GGUF
llama
TheBloke commited on
Commit
19f0a31
·
1 Parent(s): d8786a2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -110,18 +110,18 @@ Refer to the Provided Files table below to see what files use which methods, and
110
 
111
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
112
  | ---- | ---- | ---- | ---- | ---- | ----- |
113
- | [airoboros-l2-7b.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q2_K.gguf) | Q2_K | 2 | 2.83 GB| 5.33 GB | smallest, significant quality loss - not recommended for most purposes |
114
- | [airoboros-l2-7b.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q3_K_S.gguf) | Q3_K_S | 3 | 2.95 GB| 5.45 GB | very small, high quality loss |
115
- | [airoboros-l2-7b.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q3_K_M.gguf) | Q3_K_M | 3 | 3.30 GB| 5.80 GB | very small, high quality loss |
116
- | [airoboros-l2-7b.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q3_K_L.gguf) | Q3_K_L | 3 | 3.60 GB| 6.10 GB | small, substantial quality loss |
117
- | [airoboros-l2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q4_0.gguf) | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
118
- | [airoboros-l2-7b.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q4_K_S.gguf) | Q4_K_S | 4 | 3.86 GB| 6.36 GB | small, greater quality loss |
119
- | [airoboros-l2-7b.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q4_K_M.gguf) | Q4_K_M | 4 | 4.08 GB| 6.58 GB | medium, balanced quality - recommended |
120
- | [airoboros-l2-7b.Q5_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q5_0.gguf) | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
121
- | [airoboros-l2-7b.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q5_K_S.gguf) | Q5_K_S | 5 | 4.65 GB| 7.15 GB | large, low quality loss - recommended |
122
- | [airoboros-l2-7b.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q5_K_M.gguf) | Q5_K_M | 5 | 4.78 GB| 7.28 GB | large, very low quality loss - recommended |
123
- | [airoboros-l2-7b.Q6_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q6_K.gguf) | Q6_K | 6 | 5.53 GB| 8.03 GB | very large, extremely low quality loss |
124
- | [airoboros-l2-7b.Q8_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b.Q8_0.gguf) | Q8_0 | 8 | 7.16 GB| 9.66 GB | very large, extremely low quality loss - not recommended |
125
 
126
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
127
 
@@ -135,7 +135,7 @@ Refer to the Provided Files table below to see what files use which methods, and
135
  Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
136
 
137
  ```shell
138
- ./main -ngl 32 -m airoboros-l2-7b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "A chat.\nUSER: {prompt}\nASSISTANT:"
139
  ```
140
 
141
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -175,7 +175,7 @@ CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
175
  from ctransformers import AutoModelForCausalLM
176
 
177
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
178
- llm = AutoModelForCausalLM.from_pretrained("TheBloke/Airoboros-L2-7B-2.2-GGUF", model_file="airoboros-l2-7b.q4_K_M.gguf", model_type="llama", gpu_layers=50)
179
 
180
  print(llm("AI is going to"))
181
  ```
 
110
 
111
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
112
  | ---- | ---- | ---- | ---- | ---- | ----- |
113
+ | [airoboros-l2-7b-2.2.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q2_K.gguf) | Q2_K | 2 | 2.83 GB| 5.33 GB | smallest, significant quality loss - not recommended for most purposes |
114
+ | [airoboros-l2-7b-2.2.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q3_K_S.gguf) | Q3_K_S | 3 | 2.95 GB| 5.45 GB | very small, high quality loss |
115
+ | [airoboros-l2-7b-2.2.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q3_K_M.gguf) | Q3_K_M | 3 | 3.30 GB| 5.80 GB | very small, high quality loss |
116
+ | [airoboros-l2-7b-2.2.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q3_K_L.gguf) | Q3_K_L | 3 | 3.60 GB| 6.10 GB | small, substantial quality loss |
117
+ | [airoboros-l2-7b-2.2.Q4_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q4_0.gguf) | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
118
+ | [airoboros-l2-7b-2.2.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q4_K_S.gguf) | Q4_K_S | 4 | 3.86 GB| 6.36 GB | small, greater quality loss |
119
+ | [airoboros-l2-7b-2.2.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q4_K_M.gguf) | Q4_K_M | 4 | 4.08 GB| 6.58 GB | medium, balanced quality - recommended |
120
+ | [airoboros-l2-7b-2.2.Q5_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q5_0.gguf) | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
121
+ | [airoboros-l2-7b-2.2.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q5_K_S.gguf) | Q5_K_S | 5 | 4.65 GB| 7.15 GB | large, low quality loss - recommended |
122
+ | [airoboros-l2-7b-2.2.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q5_K_M.gguf) | Q5_K_M | 5 | 4.78 GB| 7.28 GB | large, very low quality loss - recommended |
123
+ | [airoboros-l2-7b-2.2.Q6_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q6_K.gguf) | Q6_K | 6 | 5.53 GB| 8.03 GB | very large, extremely low quality loss |
124
+ | [airoboros-l2-7b-2.2.Q8_0.gguf](https://huggingface.co/TheBloke/Airoboros-L2-7B-2.2-GGUF/blob/main/airoboros-l2-7b-2.2.Q8_0.gguf) | Q8_0 | 8 | 7.16 GB| 9.66 GB | very large, extremely low quality loss - not recommended |
125
 
126
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
127
 
 
135
  Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
136
 
137
  ```shell
138
+ ./main -ngl 32 -m airoboros-l2-7b-2.2.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "A chat.\nUSER: {prompt}\nASSISTANT:"
139
  ```
140
 
141
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 
175
  from ctransformers import AutoModelForCausalLM
176
 
177
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
178
+ llm = AutoModelForCausalLM.from_pretrained("TheBloke/Airoboros-L2-7B-2.2-GGUF", model_file="airoboros-l2-7b-2.2.q4_K_M.gguf", model_type="llama", gpu_layers=50)
179
 
180
  print(llm("AI is going to"))
181
  ```