Thireus commited on
Commit
24e9c5d
·
1 Parent(s): a384632

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -45,19 +45,21 @@ git clone https://github.com/Thireus/GGUF-Tool-Suite
45
  # Download model quant mix from recipe file:
46
  cd GGUF-Tool-Suite
47
  rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
48
- cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the chosen model
49
  mkdir -p kitchen && cd kitchen
50
- ../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
51
 
52
  # Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
53
 
54
  # Launch ik_llama's llama-server:
55
  ulimit -n 99999 # Lifts "too many open files" limitation on Linux
56
  ~/ik_llama.cpp/build/bin/llama-server \
57
- -m DeepSeek-R1-0528-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01148.gguf \
58
- -mla 3 -fa -amb 512 -fmoe -ctk f16 -c 4096 -ngl 99 \
59
- -ot "blk\.(3|4|5|6)\.ffn_.*=CUDA0" \
60
- -ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
 
 
61
  -ot exps=CPU -b 2048 -ub 1024 --warmup-batch --no-mmap --threads 36 \
62
  --main-gpu 0
63
  ```
@@ -76,9 +78,9 @@ ulimit -n 99999 # Lifts "too many open files" limitation on Linux
76
 
77
  ## 📊 How does it compare to other GGUFs?
78
 
79
- Here’s how DeepSeek-R1-0528 quantized with **Thireus’ GGUF Tool Suite** stacks up against other quantizers (lower perplexity = better at equal or lower bpw):
80
 
81
- ![PPLs Compared With Others](https://github.com/Thireus/GGUF-Tool-Suite/raw/main/ppl_graphs/DeepSeek-R1-0528.svg)
82
 
83
  > _Note: The `recipe_examples` files illustrate good recipes. The Tool Suite computes the optimal ppl/bpw curve for you — just specify your target RAM, VRAM, and quant types, and `quant_assign.py` finds the best mix._
84
 
 
45
  # Download model quant mix from recipe file:
46
  cd GGUF-Tool-Suite
47
  rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
48
+ cp -f models/GLM-4.5/download.conf . # Use the download.conf of the chosen model
49
  mkdir -p kitchen && cd kitchen
50
+ ../quant_downloader.sh ../recipe_examples/GLM-4.5.ROOT-2.0085bpw-5.2486ppl.83GB-GGUF_7GB-GPU_76GB-CPU.a02563d_cdb0394.recipe
51
 
52
  # Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
53
 
54
  # Launch ik_llama's llama-server:
55
  ulimit -n 99999 # Lifts "too many open files" limitation on Linux
56
  ~/ik_llama.cpp/build/bin/llama-server \
57
+ -m GLM-4.5-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01762.gguf \
58
+ -fa -fmoe -ctk f16 -c 4096 -ngl 99 \
59
+ -ot "blk\.([0-9]|[1-2][0-9]|3[0-6])\.ffn_.*=CUDA0" \
60
+ -ot "blk\.(37|38|39|[4-6][0-9]|7[0-2])\.ffn_.*=CUDA1" \
61
+ -ot "blk\.(7[3-9])\.ffn_.*=CUDA2" \
62
+ -ot "blk\.(8[0-9]|90|91|92)\.ffn_.*=CPU" \
63
  -ot exps=CPU -b 2048 -ub 1024 --warmup-batch --no-mmap --threads 36 \
64
  --main-gpu 0
65
  ```
 
78
 
79
  ## 📊 How does it compare to other GGUFs?
80
 
81
+ Here’s how GLM-4.5 quantized with **Thireus’ GGUF Tool Suite** stacks up against other quantizers (lower perplexity = better at equal or lower bpw):
82
 
83
+ ![PPLs Compared With Others](https://github.com/Thireus/GGUF-Tool-Suite/raw/main/ppl_graphs/GLM-4.5.svg)
84
 
85
  > _Note: The `recipe_examples` files illustrate good recipes. The Tool Suite computes the optimal ppl/bpw curve for you — just specify your target RAM, VRAM, and quant types, and `quant_assign.py` finds the best mix._
86