Fentible commited on
Commit
2d7feef
·
verified ·
1 Parent(s): 202123c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -30
README.md CHANGED
@@ -379,178 +379,184 @@ a:hover {
379
  </thead>
380
  <tbody>
381
  <tr>
382
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ1_S.gguf">GGUF</a></td>
383
  <td>IQ1_S</td>
384
  <td style="text-align: right;">5.27 GB</td>
385
  <td>Lowest quality, uses SOTA techniques to be usable.</td>
386
  </tr>
387
  <tr>
388
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ1_M.gguf">GGUF</a></td>
389
  <td>IQ1_M</td>
390
  <td style="text-align: right;">5.75 GB</td>
391
  <td>Extremely low quality, uses SOTA techniques to be usable.</td>
392
  </tr>
393
  <tr>
394
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ2_XXS.gguf">GGUF</a></td>
395
  <td>IQ2_XXS</td>
396
  <td style="text-align: right;">6.55 GB</td>
397
  <td>Very low quality, uses SOTA techniques to be usable.</td>
398
  </tr>
399
  <tr>
400
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ2_XS.gguf">GGUF</a></td>
401
  <td>IQ2_XS</td>
402
  <td style="text-align: right;">7.21 GB</td>
403
  <td>Low quality, uses SOTA techniques to be usable.</td>
404
  </tr>
405
  <tr>
406
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ2_S.gguf">GGUF</a></td>
407
  <td>IQ2_S</td>
408
  <td style="text-align: right;">7.48 GB</td>
409
  <td>Low quality, uses SOTA techniques to be usable.</td>
410
  </tr>
411
  <tr>
412
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ2_M.gguf">GGUF</a></td>
413
  <td>IQ2_M</td>
414
  <td style="text-align: right;">8.11 GB</td>
415
  <td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
416
  </tr>
417
  <tr>
418
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q2_K.gguf">GGUF</a></td>
419
  <td>Q2_K</td>
420
  <td style="text-align: right;">8.89 GB</td>
421
  <td>Very low quality but surprisingly usable.</td>
422
  </tr>
423
  <tr>
424
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ3_XXS.gguf">GGUF</a></td>
425
  <td>IQ3_XXS</td>
426
  <td style="text-align: right;">9.28 GB</td>
427
  <td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
428
  </tr>
429
  <tr>
430
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q2_K_L.gguf">GGUF</a></td>
431
  <td>Q2_K_L</td>
432
  <td style="text-align: right;">9.55 GB</td>
433
  <td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
434
  </tr>
435
  <tr>
436
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ3_XS.gguf">GGUF</a></td>
437
  <td>IQ3_XS</td>
438
  <td style="text-align: right;">9.91 GB</td>
439
  <td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
440
  </tr>
441
  <tr>
442
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ3_S.gguf">GGUF</a></td>
443
  <td>IQ3_S</td>
444
  <td style="text-align: right;">10.4 GB</td>
445
  <td>Lower quality, slightly better than IQ3_XS.</td>
446
  </tr>
447
  <tr>
448
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q3_K_S.gguf">GGUF</a></td>
449
  <td>Q3_K_S</td>
450
  <td style="text-align: right;">10.4 GB</td>
451
  <td>Low quality, not recommended.</td>
452
  </tr>
453
  <tr>
454
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ3_M.gguf">GGUF</a></td>
455
  <td>IQ3_M</td>
456
  <td style="text-align: right;">10.7 GB</td>
457
  <td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
458
  </tr>
459
  <tr>
460
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q3_K_M.gguf">GGUF</a></td>
461
  <td>Q3_K_M</td>
462
  <td style="text-align: right;">11.5 GB</td>
463
  <td>Lower quality but usable, good for low RAM availability.</td>
464
  </tr>
465
  <tr>
466
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q3_K_L.gguf">GGUF</a></td>
467
  <td>Q3_K_L</td>
468
  <td style="text-align: right;">12.4 GB</td>
469
  <td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
470
  </tr>
471
  <tr>
472
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ4_XS.gguf">GGUF</a></td>
473
  <td>IQ4_XS</td>
474
  <td style="text-align: right;">12.8 GB</td>
475
  <td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
476
  </tr>
477
  <tr>
478
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.IQ4_NL.gguf">GGUF</a></td>
479
  <td>IQ4_NL</td>
480
  <td style="text-align: right;">13.5 GB</td>
481
  <td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
482
  </tr>
483
  <tr>
484
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q4_0.gguf">GGUF</a></td>
485
  <td>Q4_0</td>
486
  <td style="text-align: right;">13.5 GB</td>
487
  <td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
488
  </tr>
489
  <tr>
490
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q4_K_S.gguf">GGUF</a></td>
491
  <td>Q4_K_S</td>
492
  <td style="text-align: right;">13.5 GB</td>
493
  <td>Slightly lower quality with more space savings, recommended.</td>
494
  </tr>
495
  <tr>
496
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q4_K_M.gguf">GGUF</a></td>
497
  <td>Q4_K_M</td>
498
  <td style="text-align: right;">14.3 GB</td>
499
  <td>Good quality, default size for most use cases, recommended.</td>
500
  </tr>
501
  <tr>
502
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q4_K_L.gguf">GGUF</a></td>
503
  <td>Q4_K_L</td>
504
  <td style="text-align: right;">14.8 GB</td>
505
  <td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
506
  </tr>
507
  <tr>
508
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q4_1.gguf">GGUF</a></td>
509
  <td>Q4_1</td>
510
  <td style="text-align: right;">14.9 GB</td>
511
  <td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
512
  </tr>
513
  <tr>
514
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q5_K_S.gguf">GGUF</a></td>
515
  <td>Q5_K_S</td>
516
  <td style="text-align: right;">16.3 GB</td>
517
  <td>High quality, recommended.</td>
518
  </tr>
519
  <tr>
520
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q5_K_M.gguf">GGUF</a></td>
521
  <td>Q5_K_M</td>
522
  <td style="text-align: right;">16.8 GB</td>
523
  <td>High quality, recommended.</td>
524
  </tr>
525
  <tr>
526
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q5_K_L.gguf">GGUF</a></td>
527
  <td>Q5_K_L</td>
528
  <td style="text-align: right;">17.2 GB</td>
529
  <td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
530
  </tr>
531
  <tr>
532
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q6_K.gguf">GGUF</a></td>
533
  <td>Q6_K</td>
534
  <td style="text-align: right;">19.3 GB</td>
535
  <td>Very high quality, near perfect, recommended.</td>
536
  </tr>
537
  <tr>
538
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q6_K_L.gguf">GGUF</a></td>
539
  <td>Q6_K_L</td>
540
  <td style="text-align: right;">19.7 GB</td>
541
  <td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
542
  </tr>
543
  <tr>
544
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.Q8_0.gguf">GGUF</a></td>
545
  <td>Q8_0</td>
546
  <td style="text-align: right;">25.1 GB</td>
547
  <td>Extremely high quality, generally unneeded but max available quant.</td>
548
  </tr>
549
  <tr>
550
- <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.FP16.gguf">GGUF</a></td>
 
 
 
 
 
 
551
  <td>FP16</td>
552
  <td style="text-align: right;">47.2 GB</td>
553
- <td>Full BF16 weights.</td>
554
  </tr>
555
  <tr>
556
  <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2/tree/main">SAFE</a></td>
@@ -563,6 +569,10 @@ a:hover {
563
 
564
  <p>If you need a quant that isn't uploaded you can open a request.</p>
565
 
 
 
 
 
566
  Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
567
  <img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
568
  And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>
 
379
  </thead>
380
  <tbody>
381
  <tr>
382
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ1_S.gguf">GGUF</a></td>
383
  <td>IQ1_S</td>
384
  <td style="text-align: right;">5.27 GB</td>
385
  <td>Lowest quality, uses SOTA techniques to be usable.</td>
386
  </tr>
387
  <tr>
388
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ1_M.gguf">GGUF</a></td>
389
  <td>IQ1_M</td>
390
  <td style="text-align: right;">5.75 GB</td>
391
  <td>Extremely low quality, uses SOTA techniques to be usable.</td>
392
  </tr>
393
  <tr>
394
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_XXS.gguf">GGUF</a></td>
395
  <td>IQ2_XXS</td>
396
  <td style="text-align: right;">6.55 GB</td>
397
  <td>Very low quality, uses SOTA techniques to be usable.</td>
398
  </tr>
399
  <tr>
400
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_XS.gguf">GGUF</a></td>
401
  <td>IQ2_XS</td>
402
  <td style="text-align: right;">7.21 GB</td>
403
  <td>Low quality, uses SOTA techniques to be usable.</td>
404
  </tr>
405
  <tr>
406
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_S.gguf">GGUF</a></td>
407
  <td>IQ2_S</td>
408
  <td style="text-align: right;">7.48 GB</td>
409
  <td>Low quality, uses SOTA techniques to be usable.</td>
410
  </tr>
411
  <tr>
412
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_M.gguf">GGUF</a></td>
413
  <td>IQ2_M</td>
414
  <td style="text-align: right;">8.11 GB</td>
415
  <td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
416
  </tr>
417
  <tr>
418
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q2_K.gguf">GGUF</a></td>
419
  <td>Q2_K</td>
420
  <td style="text-align: right;">8.89 GB</td>
421
  <td>Very low quality but surprisingly usable.</td>
422
  </tr>
423
  <tr>
424
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_XXS.gguf">GGUF</a></td>
425
  <td>IQ3_XXS</td>
426
  <td style="text-align: right;">9.28 GB</td>
427
  <td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
428
  </tr>
429
  <tr>
430
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q2_K_L.gguf">GGUF</a></td>
431
  <td>Q2_K_L</td>
432
  <td style="text-align: right;">9.55 GB</td>
433
  <td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
434
  </tr>
435
  <tr>
436
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_XS.gguf">GGUF</a></td>
437
  <td>IQ3_XS</td>
438
  <td style="text-align: right;">9.91 GB</td>
439
  <td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
440
  </tr>
441
  <tr>
442
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_S.gguf">GGUF</a></td>
443
  <td>IQ3_S</td>
444
  <td style="text-align: right;">10.4 GB</td>
445
  <td>Lower quality, slightly better than IQ3_XS.</td>
446
  </tr>
447
  <tr>
448
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_S.gguf">GGUF</a></td>
449
  <td>Q3_K_S</td>
450
  <td style="text-align: right;">10.4 GB</td>
451
  <td>Low quality, not recommended.</td>
452
  </tr>
453
  <tr>
454
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_M.gguf">GGUF</a></td>
455
  <td>IQ3_M</td>
456
  <td style="text-align: right;">10.7 GB</td>
457
  <td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
458
  </tr>
459
  <tr>
460
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_M.gguf">GGUF</a></td>
461
  <td>Q3_K_M</td>
462
  <td style="text-align: right;">11.5 GB</td>
463
  <td>Lower quality but usable, good for low RAM availability.</td>
464
  </tr>
465
  <tr>
466
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_L.gguf">GGUF</a></td>
467
  <td>Q3_K_L</td>
468
  <td style="text-align: right;">12.4 GB</td>
469
  <td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
470
  </tr>
471
  <tr>
472
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ4_XS.gguf">GGUF</a></td>
473
  <td>IQ4_XS</td>
474
  <td style="text-align: right;">12.8 GB</td>
475
  <td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
476
  </tr>
477
  <tr>
478
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ4_NL.gguf">GGUF</a></td>
479
  <td>IQ4_NL</td>
480
  <td style="text-align: right;">13.5 GB</td>
481
  <td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
482
  </tr>
483
  <tr>
484
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_0.gguf">GGUF</a></td>
485
  <td>Q4_0</td>
486
  <td style="text-align: right;">13.5 GB</td>
487
  <td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
488
  </tr>
489
  <tr>
490
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_S.gguf">GGUF</a></td>
491
  <td>Q4_K_S</td>
492
  <td style="text-align: right;">13.5 GB</td>
493
  <td>Slightly lower quality with more space savings, recommended.</td>
494
  </tr>
495
  <tr>
496
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_M.gguf">GGUF</a></td>
497
  <td>Q4_K_M</td>
498
  <td style="text-align: right;">14.3 GB</td>
499
  <td>Good quality, default size for most use cases, recommended.</td>
500
  </tr>
501
  <tr>
502
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_L.gguf">GGUF</a></td>
503
  <td>Q4_K_L</td>
504
  <td style="text-align: right;">14.8 GB</td>
505
  <td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
506
  </tr>
507
  <tr>
508
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_1.gguf">GGUF</a></td>
509
  <td>Q4_1</td>
510
  <td style="text-align: right;">14.9 GB</td>
511
  <td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
512
  </tr>
513
  <tr>
514
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_S.gguf">GGUF</a></td>
515
  <td>Q5_K_S</td>
516
  <td style="text-align: right;">16.3 GB</td>
517
  <td>High quality, recommended.</td>
518
  </tr>
519
  <tr>
520
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_M.gguf">GGUF</a></td>
521
  <td>Q5_K_M</td>
522
  <td style="text-align: right;">16.8 GB</td>
523
  <td>High quality, recommended.</td>
524
  </tr>
525
  <tr>
526
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_L.gguf">GGUF</a></td>
527
  <td>Q5_K_L</td>
528
  <td style="text-align: right;">17.2 GB</td>
529
  <td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
530
  </tr>
531
  <tr>
532
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q6_K.gguf">GGUF</a></td>
533
  <td>Q6_K</td>
534
  <td style="text-align: right;">19.3 GB</td>
535
  <td>Very high quality, near perfect, recommended.</td>
536
  </tr>
537
  <tr>
538
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q6_K_L.gguf">GGUF</a></td>
539
  <td>Q6_K_L</td>
540
  <td style="text-align: right;">19.7 GB</td>
541
  <td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
542
  </tr>
543
  <tr>
544
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q8_0.gguf">GGUF</a></td>
545
  <td>Q8_0</td>
546
  <td style="text-align: right;">25.1 GB</td>
547
  <td>Extremely high quality, generally unneeded but max available quant.</td>
548
  </tr>
549
  <tr>
550
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-UD-Q8_K_XL.gguf">GGUF</a></td>
551
+ <td>Q8_K_XL</td>
552
+ <td style="text-align: right;">29 GB</td>
553
+ <td>Uses FP16 for embed and output weights via Unsloth Dynamic 2.0, near perfect quality.</td>
554
+ </tr>
555
+ <tr>
556
+ <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-FP16.gguf">GGUF</a></td>
557
  <td>FP16</td>
558
  <td style="text-align: right;">47.2 GB</td>
559
+ <td>Full BF16 weights, maximum quality.</td>
560
  </tr>
561
  <tr>
562
  <td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2/tree/main">SAFE</a></td>
 
569
 
570
  <p>If you need a quant that isn't uploaded you can open a request.</p>
571
 
572
+ <p>Here is a useful tool which allows you to recreate UD quants: <a href="https://github.com/electroglyph/quant_clone">https://github.com/electroglyph/quant_clone</a></p>
573
+
574
+ <img src="https://i.imgur.com/YnTHoO1.png" width="800"></img>
575
+
576
  Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
577
  <img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
578
  And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>