Mungert commited on
Commit
e3a7cde
Β·
verified Β·
1 Parent(s): 97f399a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +41 -21
README.md CHANGED
@@ -18,8 +18,6 @@ pipeline_tag: text-generation
18
 
19
  # <span style="color: #7FFF7F;">RWKV7-Goose-World3-2.9B-HF GGUF Models</span>
20
 
21
- Note: you must use latest llama.cpp https://github.com/ggml-org/llama.cpp to run this model with llama.cpp
22
-
23
  ## **Choosing the Right Model Format**
24
 
25
  Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
@@ -27,7 +25,7 @@ Selecting the correct model format depends on your **hardware capabilities** and
27
  ### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
28
  - A 16-bit floating-point format designed for **faster computation** while retaining good precision.
29
  - Provides **similar dynamic range** as FP32 but with **lower memory usage**.
30
- - Recommended if your hardware supports **BF16 acceleration** (check your device’s specs).
31
  - Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
32
 
33
  πŸ“Œ **Use BF16 if:**
@@ -99,7 +97,7 @@ These models are optimized for **extreme memory efficiency**, making them ideal
99
  | Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
100
  |--------------|------------|---------------|----------------------|---------------|
101
  | **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
102
- | **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn’t available |
103
  | **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
104
  | **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
105
  | **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
@@ -159,23 +157,45 @@ These models are optimized for **extreme memory efficiency**, making them ideal
159
  - Prefer IQ4_NL for better accuracy.
160
 
161
  # <span id="testllm" style="color: #7F7FFF;">πŸš€ If you find these models useful</span>
162
-
163
- Please click like ❀ . Also I’d really appreciate it if you could test my Network Monitor Assistant at πŸ‘‰ [Network Monitor Assitant](https://freenetworkmonitor.click/dashboard).
164
-
165
- πŸ’¬ Click the **chat icon** (bottom right of the main and dashboard pages) . Choose a LLM; toggle between the LLM Types TurboLLM -> FreeLLM -> TestLLM.
166
-
167
- ### What I'm Testing
168
-
169
- I'm experimenting with **function calling** against my network monitoring service. Using small open source models. I am into the question "How small can it go and still function".
170
-
171
- 🟑 **TestLLM** – Runs the current testing model using llama.cpp on 6 threads of a Cpu VM (Should take about 15s to load. Inference speed is quite slow and it only processes one user prompt at a timeβ€”still working on scaling!). If you're curious, I'd be happy to share how it works! .
172
-
173
- ### The other Available AI Assistants
174
-
175
- 🟒 **TurboLLM** – Uses **gpt-4o-mini** Fast! . Note: tokens are limited since OpenAI models are pricey, but you can [Login](https://freenetworkmonitor.click) or [Download](https://freenetworkmonitor.click/download) the Free Network Monitor agent to get more tokens, Alternatively use the TestLLM .
176
-
177
- πŸ”΅ **HugLLM** – Runs **open-source Hugging Face models** Fast, Runs small models (β‰ˆ8B) hence lower quality, Get 2x more tokens (subject to Hugging Face API availability)
178
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
 
180
 
181
 
 
18
 
19
  # <span style="color: #7FFF7F;">RWKV7-Goose-World3-2.9B-HF GGUF Models</span>
20
 
 
 
21
  ## **Choosing the Right Model Format**
22
 
23
  Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
 
25
  ### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
26
  - A 16-bit floating-point format designed for **faster computation** while retaining good precision.
27
  - Provides **similar dynamic range** as FP32 but with **lower memory usage**.
28
+ - Recommended if your hardware supports **BF16 acceleration** (check your device's specs).
29
  - Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
30
 
31
  πŸ“Œ **Use BF16 if:**
 
97
  | Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
98
  |--------------|------------|---------------|----------------------|---------------|
99
  | **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
100
+ | **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn't available |
101
  | **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
102
  | **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
103
  | **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
 
157
  - Prefer IQ4_NL for better accuracy.
158
 
159
  # <span id="testllm" style="color: #7F7FFF;">πŸš€ If you find these models useful</span>
160
+ ❀ **Please click "Like" if you find this useful!**
161
+ Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:
162
+ πŸ‘‰ [Free Network Monitor](https://freenetworkmonitor.click/dashboard)
163
+
164
+ πŸ’¬ **How to test**:
165
+ 1. Click the **chat icon** (bottom right on any page)
166
+ 2. Choose an **AI assistant type**:
167
+ - `TurboLLM` (GPT-4-mini)
168
+ - `FreeLLM` (Open-source)
169
+ - `TestLLM` (Experimental CPU-only)
170
+
171
+ ### **What I’m Testing**
172
+ I’m pushing the limits of **small open-source models for AI network monitoring**, specifically:
173
+ - **Function calling** against live network services
174
+ - **How small can a model go** while still handling:
175
+ - Automated **Nmap scans**
176
+ - **Quantum-readiness checks**
177
+ - **Metasploit integration**
178
+
179
+ 🟑 **TestLLM** – Current experimental model (llama.cpp on 6 CPU threads):
180
+ - βœ… **Zero-configuration setup**
181
+ - ⏳ 30s load time (slow inference but **no API costs**)
182
+ - πŸ”§ **Help wanted!** If you’re into **edge-device AI**, let’s collaborate!
183
+
184
+ ### **Other Assistants**
185
+ 🟒 **TurboLLM** – Uses **gpt-4-mini** for:
186
+ - **Real-time network diagnostics**
187
+ - **Automated penetration testing** (Nmap/Metasploit)
188
+ - πŸ”‘ Get more tokens by [downloading our Free Network Monitor Agent](https://freenetworkmonitor.click/download)
189
+
190
+ πŸ”΅ **HugLLM** – Open-source models (β‰ˆ8B params):
191
+ - **2x more tokens** than TurboLLM
192
+ - **AI-powered log analysis**
193
+ - 🌐 Runs on Hugging Face Inference API
194
+
195
+ ### πŸ’‘ **Example AI Commands to Test**:
196
+ 1. `"Give me info on my websites SSL certificate"`
197
+ 2. `"Check if my server is using quantum safe encyption for communication"`
198
+ 3. `"Run a quick Nmap vulnerability test"`
199
 
200
 
201