Update README.md
Browse files
README.md
CHANGED
@@ -4,9 +4,24 @@ language:
|
|
4 |
- zh
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
7 |
-
library_name:
|
|
|
|
|
8 |
---
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
# GLM-4-9B-0414
|
11 |
|
12 |
## Introduction
|
@@ -393,4 +408,4 @@ while True:
|
|
393 |
|
394 |
[2] [Agentless v1.5.0](https://github.com/OpenAutoCoder/Agentless) used [BGE](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md) as the embedding model and [FAISS](https://github.com/facebookresearch/faiss) for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s.
|
395 |
|
396 |
-
[3] [OpenHands v0.29.1](https://github.com/All-Hands-AI/OpenHands/tree/main) did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llm_config="condenser", keep_first=1, max_size=32`. No retries on failed trajectories.
|
|
|
4 |
- zh
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
7 |
+
library_name: exllamav2
|
8 |
+
base_model:
|
9 |
+
- THUDM/GLM-4-9B-0414
|
10 |
---
|
11 |
+
# GLM-4-9B-0414-exl2
|
12 |
+
Original model: [GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414) by [Z.ai & THUKEG](https://huggingface.co/THUDM)
|
13 |
+
|
14 |
+
## Quants
|
15 |
+
[4bpw h6 (main)](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/main)
|
16 |
+
[4.5bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/4.5bpw-h6)
|
17 |
+
[5bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/5bpw-h6)
|
18 |
+
[6bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/6bpw-h6)
|
19 |
+
[8bpw h8](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/8bpw-h8)
|
20 |
+
|
21 |
+
## Qutantization notes
|
22 |
+
Made with Exllamav2 0.2.9 with default dataset. These quants require Exllamav2 0.2.9 or newer.
|
23 |
+
These quants can be used with TabbyAPI or Text-Generation-WebUI with RTX GPU (Windows) or RTX/ROCm (Linux).
|
24 |
+
Ensure it fits your GPU VRAM since Exllamav2 doesn't support native RAM offloading.
|
25 |
# GLM-4-9B-0414
|
26 |
|
27 |
## Introduction
|
|
|
408 |
|
409 |
[2] [Agentless v1.5.0](https://github.com/OpenAutoCoder/Agentless) used [BGE](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md) as the embedding model and [FAISS](https://github.com/facebookresearch/faiss) for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s.
|
410 |
|
411 |
+
[3] [OpenHands v0.29.1](https://github.com/All-Hands-AI/OpenHands/tree/main) did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llm_config="condenser", keep_first=1, max_size=32`. No retries on failed trajectories.
|