update -ngl arg to count the output layer (#6)

- update -ngl arg to count the output layer (e46235c7812188677b9dee397be152d0b897fcd1)

Co-authored-by: blakkd <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -111,7 +111,7 @@ You can run EXAONE models locally using llama.cpp by following these steps:
 4. Generate result with greedy decoding.
     ```bash
     llama-cli -m EXAONE-4.0-32B-GGUF-Q4_K_M.gguf \
-        -fa -ngl 64 \
         --temp 0.0 --top-k 1 \
         -f inputs.txt -no-cnv
     ```
@@ -124,7 +124,7 @@ You can run EXAONE models locally using llama.cpp by following these steps:
 3. Run llama-server with EXAONE 4.0 Jinja template. You can find the [chat template file](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B-GGUF/blob/main/chat_template.jinja) in this repository.
     ```bash
     llama-server -m EXAONE-4.0-32B-Q4_K_M.gguf \
-        -c 131072 -fa -ngl 64 \
         --temp 0.6 --top-p 0.95 \
         --jinja --chat-template-file chat_template.jinja \
         --host 0.0.0.0 --port 8820 \

 4. Generate result with greedy decoding.
     ```bash
     llama-cli -m EXAONE-4.0-32B-GGUF-Q4_K_M.gguf \
+        -fa -ngl 65 \
         --temp 0.0 --top-k 1 \
         -f inputs.txt -no-cnv
     ```
 3. Run llama-server with EXAONE 4.0 Jinja template. You can find the [chat template file](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B-GGUF/blob/main/chat_template.jinja) in this repository.
     ```bash
     llama-server -m EXAONE-4.0-32B-Q4_K_M.gguf \
+        -c 131072 -fa -ngl 65 \
         --temp 0.6 --top-p 0.95 \
         --jinja --chat-template-file chat_template.jinja \
         --host 0.0.0.0 --port 8820 \