Update README.md
Browse files
README.md
CHANGED
@@ -28,14 +28,13 @@ language:
|
|
28 |
- sr
|
29 |
- sv
|
30 |
- uk
|
31 |
-
library_name: adapter-transformers
|
32 |
---
|
33 |
|
34 |
LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.
|
35 |
|
36 |
-
|
37 |
-
Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types.
|
38 |
|
39 |
-
I recommend the following settings when running as a good starting point:
|
|
|
40 |
|
41 |
Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.
|
|
|
28 |
- sr
|
29 |
- sv
|
30 |
- uk
|
|
|
31 |
---
|
32 |
|
33 |
LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.
|
34 |
|
35 |
+
Legacy is for llama.cpp setups older than https://github.com/ggerganov/llama.cpp/pull/1405, the regular is faster but does not work on old versions.
|
|
|
36 |
|
37 |
+
I recommend the following settings when running as a good starting point:
|
38 |
+
```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 32 -c 2048 --temp 0.7 --repeat_penalty 1.2 --mirostat 2 --interactive-first --color```
|
39 |
|
40 |
Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.
|