update -ngl arg to count the output layer
#6
by
owao
- opened
README.md
CHANGED
@@ -111,7 +111,7 @@ You can run EXAONE models locally using llama.cpp by following these steps:
|
|
111 |
4. Generate result with greedy decoding.
|
112 |
```bash
|
113 |
llama-cli -m EXAONE-4.0-32B-GGUF-Q4_K_M.gguf \
|
114 |
-
-fa -ngl
|
115 |
--temp 0.0 --top-k 1 \
|
116 |
-f inputs.txt -no-cnv
|
117 |
```
|
@@ -124,7 +124,7 @@ You can run EXAONE models locally using llama.cpp by following these steps:
|
|
124 |
3. Run llama-server with EXAONE 4.0 Jinja template. You can find the [chat template file](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B-GGUF/blob/main/chat_template.jinja) in this repository.
|
125 |
```bash
|
126 |
llama-server -m EXAONE-4.0-32B-Q4_K_M.gguf \
|
127 |
-
-c 131072 -fa -ngl
|
128 |
--temp 0.6 --top-p 0.95 \
|
129 |
--jinja --chat-template-file chat_template.jinja \
|
130 |
--host 0.0.0.0 --port 8820 \
|
|
|
111 |
4. Generate result with greedy decoding.
|
112 |
```bash
|
113 |
llama-cli -m EXAONE-4.0-32B-GGUF-Q4_K_M.gguf \
|
114 |
+
-fa -ngl 65 \
|
115 |
--temp 0.0 --top-k 1 \
|
116 |
-f inputs.txt -no-cnv
|
117 |
```
|
|
|
124 |
3. Run llama-server with EXAONE 4.0 Jinja template. You can find the [chat template file](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B-GGUF/blob/main/chat_template.jinja) in this repository.
|
125 |
```bash
|
126 |
llama-server -m EXAONE-4.0-32B-Q4_K_M.gguf \
|
127 |
+
-c 131072 -fa -ngl 65 \
|
128 |
--temp 0.6 --top-p 0.95 \
|
129 |
--jinja --chat-template-file chat_template.jinja \
|
130 |
--host 0.0.0.0 --port 8820 \
|