Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -18,8 +18,6 @@ pipeline_tag: text-generation
|
|
18 |
|
19 |
# <span style="color: #7FFF7F;">RWKV7-Goose-World3-2.9B-HF GGUF Models</span>
|
20 |
|
21 |
-
Note: you must use latest llama.cpp https://github.com/ggml-org/llama.cpp to run this model with llama.cpp
|
22 |
-
|
23 |
## **Choosing the Right Model Format**
|
24 |
|
25 |
Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
|
@@ -27,7 +25,7 @@ Selecting the correct model format depends on your **hardware capabilities** and
|
|
27 |
### **BF16 (Brain Float 16) β Use if BF16 acceleration is available**
|
28 |
- A 16-bit floating-point format designed for **faster computation** while retaining good precision.
|
29 |
- Provides **similar dynamic range** as FP32 but with **lower memory usage**.
|
30 |
-
- Recommended if your hardware supports **BF16 acceleration** (check your device
|
31 |
- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
|
32 |
|
33 |
π **Use BF16 if:**
|
@@ -99,7 +97,7 @@ These models are optimized for **extreme memory efficiency**, making them ideal
|
|
99 |
| Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
|
100 |
|--------------|------------|---------------|----------------------|---------------|
|
101 |
| **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
|
102 |
-
| **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn
|
103 |
| **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
|
104 |
| **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
|
105 |
| **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
|
@@ -159,23 +157,45 @@ These models are optimized for **extreme memory efficiency**, making them ideal
|
|
159 |
- Prefer IQ4_NL for better accuracy.
|
160 |
|
161 |
# <span id="testllm" style="color: #7F7FFF;">π If you find these models useful</span>
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
###
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
179 |
|
180 |
|
181 |
|
|
|
18 |
|
19 |
# <span style="color: #7FFF7F;">RWKV7-Goose-World3-2.9B-HF GGUF Models</span>
|
20 |
|
|
|
|
|
21 |
## **Choosing the Right Model Format**
|
22 |
|
23 |
Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
|
|
|
25 |
### **BF16 (Brain Float 16) β Use if BF16 acceleration is available**
|
26 |
- A 16-bit floating-point format designed for **faster computation** while retaining good precision.
|
27 |
- Provides **similar dynamic range** as FP32 but with **lower memory usage**.
|
28 |
+
- Recommended if your hardware supports **BF16 acceleration** (check your device's specs).
|
29 |
- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
|
30 |
|
31 |
π **Use BF16 if:**
|
|
|
97 |
| Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
|
98 |
|--------------|------------|---------------|----------------------|---------------|
|
99 |
| **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
|
100 |
+
| **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn't available |
|
101 |
| **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
|
102 |
| **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
|
103 |
| **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
|
|
|
157 |
- Prefer IQ4_NL for better accuracy.
|
158 |
|
159 |
# <span id="testllm" style="color: #7F7FFF;">π If you find these models useful</span>
|
160 |
+
β€ **Please click "Like" if you find this useful!**
|
161 |
+
Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:
|
162 |
+
π [Free Network Monitor](https://freenetworkmonitor.click/dashboard)
|
163 |
+
|
164 |
+
π¬ **How to test**:
|
165 |
+
1. Click the **chat icon** (bottom right on any page)
|
166 |
+
2. Choose an **AI assistant type**:
|
167 |
+
- `TurboLLM` (GPT-4-mini)
|
168 |
+
- `FreeLLM` (Open-source)
|
169 |
+
- `TestLLM` (Experimental CPU-only)
|
170 |
+
|
171 |
+
### **What Iβm Testing**
|
172 |
+
Iβm pushing the limits of **small open-source models for AI network monitoring**, specifically:
|
173 |
+
- **Function calling** against live network services
|
174 |
+
- **How small can a model go** while still handling:
|
175 |
+
- Automated **Nmap scans**
|
176 |
+
- **Quantum-readiness checks**
|
177 |
+
- **Metasploit integration**
|
178 |
+
|
179 |
+
π‘ **TestLLM** β Current experimental model (llama.cpp on 6 CPU threads):
|
180 |
+
- β
**Zero-configuration setup**
|
181 |
+
- β³ 30s load time (slow inference but **no API costs**)
|
182 |
+
- π§ **Help wanted!** If youβre into **edge-device AI**, letβs collaborate!
|
183 |
+
|
184 |
+
### **Other Assistants**
|
185 |
+
π’ **TurboLLM** β Uses **gpt-4-mini** for:
|
186 |
+
- **Real-time network diagnostics**
|
187 |
+
- **Automated penetration testing** (Nmap/Metasploit)
|
188 |
+
- π Get more tokens by [downloading our Free Network Monitor Agent](https://freenetworkmonitor.click/download)
|
189 |
+
|
190 |
+
π΅ **HugLLM** β Open-source models (β8B params):
|
191 |
+
- **2x more tokens** than TurboLLM
|
192 |
+
- **AI-powered log analysis**
|
193 |
+
- π Runs on Hugging Face Inference API
|
194 |
+
|
195 |
+
### π‘ **Example AI Commands to Test**:
|
196 |
+
1. `"Give me info on my websites SSL certificate"`
|
197 |
+
2. `"Check if my server is using quantum safe encyption for communication"`
|
198 |
+
3. `"Run a quick Nmap vulnerability test"`
|
199 |
|
200 |
|
201 |
|