second-state
/

Llama3.1-8B-Chinese-Chat-GGUF

@@ -50,7 +50,7 @@ tags:
     {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
     ```
-- Context size: `4096`
 - Run as LlamaEdge service
@@ -58,7 +58,7 @@ tags:
   wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3.1-8B-Chinese-Chat-Q5_K_M.gguf \
     llama-api-server.wasm \
     --prompt-template llama-3-chat \
-    --ctx-size 4096 \
     --model-name Llama-3-8B-Chinese-Chat \
   ```
@@ -68,7 +68,7 @@ tags:
   wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3.1-8B-Chinese-Chat-Q5_K_M.gguf \
     llama-chat.wasm \
     --prompt-template llama-3-chat \
-    --ctx-size 4096
   ```
 ## Quantized GGUF Models

     {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
     ```
+- Context size: `128000`
 - Run as LlamaEdge service
   wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3.1-8B-Chinese-Chat-Q5_K_M.gguf \
     llama-api-server.wasm \
     --prompt-template llama-3-chat \
+    --ctx-size 128000 \
     --model-name Llama-3-8B-Chinese-Chat \
   ```
   wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3.1-8B-Chinese-Chat-Q5_K_M.gguf \
     llama-chat.wasm \
     --prompt-template llama-3-chat \
+    --ctx-size 128000
   ```
 ## Quantized GGUF Models