Fentible commited on
Commit
2b1fd3c
·
verified ·
1 Parent(s): c4fe12f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -30
README.md CHANGED
@@ -382,178 +382,184 @@ a:hover {
382
  </thead>
383
  <tbody>
384
  <tr>
385
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ1_S.gguf">GGUF</a></td>
386
  <td>IQ1_S</td>
387
  <td style="text-align: right;">5.27 GB</td>
388
  <td>Lowest quality, uses SOTA techniques to be usable.</td>
389
  </tr>
390
  <tr>
391
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ1_M.gguf">GGUF</a></td>
392
  <td>IQ1_M</td>
393
  <td style="text-align: right;">5.75 GB</td>
394
  <td>Extremely low quality, uses SOTA techniques to be usable.</td>
395
  </tr>
396
  <tr>
397
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ2_XXS.gguf">GGUF</a></td>
398
  <td>IQ2_XXS</td>
399
  <td style="text-align: right;">6.55 GB</td>
400
  <td>Very low quality, uses SOTA techniques to be usable.</td>
401
  </tr>
402
  <tr>
403
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ2_XS.gguf">GGUF</a></td>
404
  <td>IQ2_XS</td>
405
  <td style="text-align: right;">7.21 GB</td>
406
  <td>Low quality, uses SOTA techniques to be usable.</td>
407
  </tr>
408
  <tr>
409
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ2_S.gguf">GGUF</a></td>
410
  <td>IQ2_S</td>
411
  <td style="text-align: right;">7.48 GB</td>
412
  <td>Low quality, uses SOTA techniques to be usable.</td>
413
  </tr>
414
  <tr>
415
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ2_M.gguf">GGUF</a></td>
416
  <td>IQ2_M</td>
417
  <td style="text-align: right;">8.11 GB</td>
418
  <td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
419
  </tr>
420
  <tr>
421
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q2_K.gguf">GGUF</a></td>
422
  <td>Q2_K</td>
423
  <td style="text-align: right;">8.89 GB</td>
424
  <td>Very low quality but surprisingly usable.</td>
425
  </tr>
426
  <tr>
427
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ3_XXS.gguf">GGUF</a></td>
428
  <td>IQ3_XXS</td>
429
  <td style="text-align: right;">9.28 GB</td>
430
  <td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
431
  </tr>
432
  <tr>
433
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q2_K_L.gguf">GGUF</a></td>
434
  <td>Q2_K_L</td>
435
  <td style="text-align: right;">9.55 GB</td>
436
  <td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
437
  </tr>
438
  <tr>
439
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ3_XS.gguf">GGUF</a></td>
440
  <td>IQ3_XS</td>
441
  <td style="text-align: right;">9.91 GB</td>
442
  <td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
443
  </tr>
444
  <tr>
445
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ3_S.gguf">GGUF</a></td>
446
  <td>IQ3_S</td>
447
  <td style="text-align: right;">10.4 GB</td>
448
  <td>Lower quality, slightly better than IQ3_XS.</td>
449
  </tr>
450
  <tr>
451
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q3_K_S.gguf">GGUF</a></td>
452
  <td>Q3_K_S</td>
453
  <td style="text-align: right;">10.4 GB</td>
454
  <td>Low quality, not recommended.</td>
455
  </tr>
456
  <tr>
457
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ3_M.gguf">GGUF</a></td>
458
  <td>IQ3_M</td>
459
  <td style="text-align: right;">10.7 GB</td>
460
  <td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
461
  </tr>
462
  <tr>
463
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q3_K_M.gguf">GGUF</a></td>
464
  <td>Q3_K_M</td>
465
  <td style="text-align: right;">11.5 GB</td>
466
  <td>Lower quality but usable, good for low RAM availability.</td>
467
  </tr>
468
  <tr>
469
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q3_K_L.gguf">GGUF</a></td>
470
  <td>Q3_K_L</td>
471
  <td style="text-align: right;">12.4 GB</td>
472
  <td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
473
  </tr>
474
  <tr>
475
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ4_XS.gguf">GGUF</a></td>
476
  <td>IQ4_XS</td>
477
  <td style="text-align: right;">12.8 GB</td>
478
  <td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
479
  </tr>
480
  <tr>
481
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.IQ4_NL.gguf">GGUF</a></td>
482
  <td>IQ4_NL</td>
483
  <td style="text-align: right;">13.5 GB</td>
484
  <td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
485
  </tr>
486
  <tr>
487
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q4_0.gguf">GGUF</a></td>
488
  <td>Q4_0</td>
489
  <td style="text-align: right;">13.5 GB</td>
490
  <td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
491
  </tr>
492
  <tr>
493
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q4_K_S.gguf">GGUF</a></td>
494
  <td>Q4_K_S</td>
495
  <td style="text-align: right;">13.5 GB</td>
496
  <td>Slightly lower quality with more space savings, recommended.</td>
497
  </tr>
498
  <tr>
499
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q4_K_M.gguf">GGUF</a></td>
500
  <td>Q4_K_M</td>
501
  <td style="text-align: right;">14.3 GB</td>
502
  <td>Good quality, default size for most use cases, recommended.</td>
503
  </tr>
504
  <tr>
505
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q4_K_L.gguf">GGUF</a></td>
506
  <td>Q4_K_L</td>
507
  <td style="text-align: right;">14.8 GB</td>
508
  <td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
509
  </tr>
510
  <tr>
511
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q4_1.gguf">GGUF</a></td>
512
  <td>Q4_1</td>
513
  <td style="text-align: right;">14.9 GB</td>
514
  <td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
515
  </tr>
516
  <tr>
517
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q5_K_S.gguf">GGUF</a></td>
518
  <td>Q5_K_S</td>
519
  <td style="text-align: right;">16.3 GB</td>
520
  <td>High quality, recommended.</td>
521
  </tr>
522
  <tr>
523
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q5_K_M.gguf">GGUF</a></td>
524
  <td>Q5_K_M</td>
525
  <td style="text-align: right;">16.8 GB</td>
526
  <td>High quality, recommended.</td>
527
  </tr>
528
  <tr>
529
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q5_K_L.gguf">GGUF</a></td>
530
  <td>Q5_K_L</td>
531
  <td style="text-align: right;">17.2 GB</td>
532
  <td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
533
  </tr>
534
  <tr>
535
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q6_K.gguf">GGUF</a></td>
536
  <td>Q6_K</td>
537
  <td style="text-align: right;">19.3 GB</td>
538
  <td>Very high quality, near perfect, recommended.</td>
539
  </tr>
540
  <tr>
541
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q6_K_L.gguf">GGUF</a></td>
542
  <td>Q6_K_L</td>
543
  <td style="text-align: right;">19.7 GB</td>
544
  <td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
545
  </tr>
546
  <tr>
547
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.Q8_0.gguf">GGUF</a></td>
548
  <td>Q8_0</td>
549
  <td style="text-align: right;">25.1 GB</td>
550
  <td>Extremely high quality, generally unneeded but max available quant.</td>
551
  </tr>
552
  <tr>
553
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1.FP16.gguf">GGUF</a></td>
 
 
 
 
 
 
554
  <td>FP16</td>
555
  <td style="text-align: right;">47.2 GB</td>
556
- <td>Full BF16 weights.</td>
557
  </tr>
558
  <tr>
559
  <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1/tree/main">SAFE</a></td>
@@ -566,6 +572,10 @@ a:hover {
566
 
567
  <p>If you need a quant that isn't uploaded you can open a request.</p>
568
 
 
 
 
 
569
  Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
570
  <img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
571
  And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>
 
382
  </thead>
383
  <tbody>
384
  <tr>
385
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ1_S.gguf">GGUF</a></td>
386
  <td>IQ1_S</td>
387
  <td style="text-align: right;">5.27 GB</td>
388
  <td>Lowest quality, uses SOTA techniques to be usable.</td>
389
  </tr>
390
  <tr>
391
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ1_M.gguf">GGUF</a></td>
392
  <td>IQ1_M</td>
393
  <td style="text-align: right;">5.75 GB</td>
394
  <td>Extremely low quality, uses SOTA techniques to be usable.</td>
395
  </tr>
396
  <tr>
397
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ2_XXS.gguf">GGUF</a></td>
398
  <td>IQ2_XXS</td>
399
  <td style="text-align: right;">6.55 GB</td>
400
  <td>Very low quality, uses SOTA techniques to be usable.</td>
401
  </tr>
402
  <tr>
403
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ2_XS.gguf">GGUF</a></td>
404
  <td>IQ2_XS</td>
405
  <td style="text-align: right;">7.21 GB</td>
406
  <td>Low quality, uses SOTA techniques to be usable.</td>
407
  </tr>
408
  <tr>
409
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ2_S.gguf">GGUF</a></td>
410
  <td>IQ2_S</td>
411
  <td style="text-align: right;">7.48 GB</td>
412
  <td>Low quality, uses SOTA techniques to be usable.</td>
413
  </tr>
414
  <tr>
415
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ2_M.gguf">GGUF</a></td>
416
  <td>IQ2_M</td>
417
  <td style="text-align: right;">8.11 GB</td>
418
  <td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
419
  </tr>
420
  <tr>
421
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q2_K.gguf">GGUF</a></td>
422
  <td>Q2_K</td>
423
  <td style="text-align: right;">8.89 GB</td>
424
  <td>Very low quality but surprisingly usable.</td>
425
  </tr>
426
  <tr>
427
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ3_XXS.gguf">GGUF</a></td>
428
  <td>IQ3_XXS</td>
429
  <td style="text-align: right;">9.28 GB</td>
430
  <td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
431
  </tr>
432
  <tr>
433
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q2_K_L.gguf">GGUF</a></td>
434
  <td>Q2_K_L</td>
435
  <td style="text-align: right;">9.55 GB</td>
436
  <td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
437
  </tr>
438
  <tr>
439
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ3_XS.gguf">GGUF</a></td>
440
  <td>IQ3_XS</td>
441
  <td style="text-align: right;">9.91 GB</td>
442
  <td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
443
  </tr>
444
  <tr>
445
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ3_S.gguf">GGUF</a></td>
446
  <td>IQ3_S</td>
447
  <td style="text-align: right;">10.4 GB</td>
448
  <td>Lower quality, slightly better than IQ3_XS.</td>
449
  </tr>
450
  <tr>
451
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q3_K_S.gguf">GGUF</a></td>
452
  <td>Q3_K_S</td>
453
  <td style="text-align: right;">10.4 GB</td>
454
  <td>Low quality, not recommended.</td>
455
  </tr>
456
  <tr>
457
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ3_M.gguf">GGUF</a></td>
458
  <td>IQ3_M</td>
459
  <td style="text-align: right;">10.7 GB</td>
460
  <td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
461
  </tr>
462
  <tr>
463
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q3_K_M.gguf">GGUF</a></td>
464
  <td>Q3_K_M</td>
465
  <td style="text-align: right;">11.5 GB</td>
466
  <td>Lower quality but usable, good for low RAM availability.</td>
467
  </tr>
468
  <tr>
469
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q3_K_L.gguf">GGUF</a></td>
470
  <td>Q3_K_L</td>
471
  <td style="text-align: right;">12.4 GB</td>
472
  <td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
473
  </tr>
474
  <tr>
475
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ4_XS.gguf">GGUF</a></td>
476
  <td>IQ4_XS</td>
477
  <td style="text-align: right;">12.8 GB</td>
478
  <td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
479
  </tr>
480
  <tr>
481
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-IQ4_NL.gguf">GGUF</a></td>
482
  <td>IQ4_NL</td>
483
  <td style="text-align: right;">13.5 GB</td>
484
  <td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
485
  </tr>
486
  <tr>
487
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q4_0.gguf">GGUF</a></td>
488
  <td>Q4_0</td>
489
  <td style="text-align: right;">13.5 GB</td>
490
  <td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
491
  </tr>
492
  <tr>
493
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q4_K_S.gguf">GGUF</a></td>
494
  <td>Q4_K_S</td>
495
  <td style="text-align: right;">13.5 GB</td>
496
  <td>Slightly lower quality with more space savings, recommended.</td>
497
  </tr>
498
  <tr>
499
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q4_K_M.gguf">GGUF</a></td>
500
  <td>Q4_K_M</td>
501
  <td style="text-align: right;">14.3 GB</td>
502
  <td>Good quality, default size for most use cases, recommended.</td>
503
  </tr>
504
  <tr>
505
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q4_K_L.gguf">GGUF</a></td>
506
  <td>Q4_K_L</td>
507
  <td style="text-align: right;">14.8 GB</td>
508
  <td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
509
  </tr>
510
  <tr>
511
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q4_1.gguf">GGUF</a></td>
512
  <td>Q4_1</td>
513
  <td style="text-align: right;">14.9 GB</td>
514
  <td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
515
  </tr>
516
  <tr>
517
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q5_K_S.gguf">GGUF</a></td>
518
  <td>Q5_K_S</td>
519
  <td style="text-align: right;">16.3 GB</td>
520
  <td>High quality, recommended.</td>
521
  </tr>
522
  <tr>
523
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q5_K_M.gguf">GGUF</a></td>
524
  <td>Q5_K_M</td>
525
  <td style="text-align: right;">16.8 GB</td>
526
  <td>High quality, recommended.</td>
527
  </tr>
528
  <tr>
529
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q5_K_L.gguf">GGUF</a></td>
530
  <td>Q5_K_L</td>
531
  <td style="text-align: right;">17.2 GB</td>
532
  <td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
533
  </tr>
534
  <tr>
535
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q6_K.gguf">GGUF</a></td>
536
  <td>Q6_K</td>
537
  <td style="text-align: right;">19.3 GB</td>
538
  <td>Very high quality, near perfect, recommended.</td>
539
  </tr>
540
  <tr>
541
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q6_K_L.gguf">GGUF</a></td>
542
  <td>Q6_K_L</td>
543
  <td style="text-align: right;">19.7 GB</td>
544
  <td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
545
  </tr>
546
  <tr>
547
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-Q8_0.gguf">GGUF</a></td>
548
  <td>Q8_0</td>
549
  <td style="text-align: right;">25.1 GB</td>
550
  <td>Extremely high quality, generally unneeded but max available quant.</td>
551
  </tr>
552
  <tr>
553
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-UD-Q8_K_XL.gguf">GGUF</a></td>
554
+ <td>Q8_K_XL</td>
555
+ <td style="text-align: right;">29 GB</td>
556
+ <td>Uses FP16 for embed and output weights via Unsloth Dynamic 2.0, near perfect quality.</td>
557
+ </tr>
558
+ <tr>
559
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1-GGUF/resolve/main/Cthulhu-24B-v1.1-FP16.gguf">GGUF</a></td>
560
  <td>FP16</td>
561
  <td style="text-align: right;">47.2 GB</td>
562
+ <td>Full BF16 weights, maximum quality.</td>
563
  </tr>
564
  <tr>
565
  <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.1/tree/main">SAFE</a></td>
 
572
 
573
  <p>If you need a quant that isn't uploaded you can open a request.</p>
574
 
575
+ <p>Here is a useful tool which allows you to recreate UD quants: <a href="https://github.com/electroglyph/quant_clone">https://github.com/electroglyph/quant_clone</a></p>
576
+
577
+ <img src="https://i.imgur.com/YnTHoO1.png" width="800"></img>
578
+
579
  Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
580
  <img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
581
  And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>