cgus commited on
Commit
1a33b16
·
verified ·
1 Parent(s): be6dbb3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -4,9 +4,24 @@ language:
4
  - zh
5
  - en
6
  pipeline_tag: text-generation
7
- library_name: transformers
 
 
8
  ---
9
-
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  # GLM-4-9B-0414
11
 
12
  ## Introduction
@@ -393,4 +408,4 @@ while True:
393
 
394
  [2] [Agentless v1.5.0](https://github.com/OpenAutoCoder/Agentless) used [BGE](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md) as the embedding model and [FAISS](https://github.com/facebookresearch/faiss) for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s.
395
 
396
- [3] [OpenHands v0.29.1](https://github.com/All-Hands-AI/OpenHands/tree/main) did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llm_config="condenser", keep_first=1, max_size=32`. No retries on failed trajectories.
 
4
  - zh
5
  - en
6
  pipeline_tag: text-generation
7
+ library_name: exllamav2
8
+ base_model:
9
+ - THUDM/GLM-4-9B-0414
10
  ---
11
+ # GLM-4-9B-0414-exl2
12
+ Original model: [GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414) by [Z.ai & THUKEG](https://huggingface.co/THUDM)
13
+
14
+ ## Quants
15
+ [4bpw h6 (main)](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/main)
16
+ [4.5bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/4.5bpw-h6)
17
+ [5bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/5bpw-h6)
18
+ [6bpw h6](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/6bpw-h6)
19
+ [8bpw h8](https://huggingface.co/cgus/GLM-4-9B-0414-exl2/tree/8bpw-h8)
20
+
21
+ ## Qutantization notes
22
+ Made with Exllamav2 0.2.9 with default dataset. These quants require Exllamav2 0.2.9 or newer.
23
+ These quants can be used with TabbyAPI or Text-Generation-WebUI with RTX GPU (Windows) or RTX/ROCm (Linux).
24
+ Ensure it fits your GPU VRAM since Exllamav2 doesn't support native RAM offloading.
25
  # GLM-4-9B-0414
26
 
27
  ## Introduction
 
408
 
409
  [2] [Agentless v1.5.0](https://github.com/OpenAutoCoder/Agentless) used [BGE](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md) as the embedding model and [FAISS](https://github.com/facebookresearch/faiss) for similarity search. To speed up patch verification while maintaining performance, the timeout for running a single instance was changed from the default 300s to 180s.
410
 
411
+ [3] [OpenHands v0.29.1](https://github.com/All-Hands-AI/OpenHands/tree/main) did not use YaRN context extension but limited runs to a maximum of 60 iterations and summarized the history to prevent exceeding the 32K context limit. Summarization was configured as `llm_config="condenser", keep_first=1, max_size=32`. No retries on failed trajectories.