Update README.md
Browse files
README.md
CHANGED
@@ -379,178 +379,184 @@ a:hover {
|
|
379 |
</thead>
|
380 |
<tbody>
|
381 |
<tr>
|
382 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
383 |
<td>IQ1_S</td>
|
384 |
<td style="text-align: right;">5.27 GB</td>
|
385 |
<td>Lowest quality, uses SOTA techniques to be usable.</td>
|
386 |
</tr>
|
387 |
<tr>
|
388 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
389 |
<td>IQ1_M</td>
|
390 |
<td style="text-align: right;">5.75 GB</td>
|
391 |
<td>Extremely low quality, uses SOTA techniques to be usable.</td>
|
392 |
</tr>
|
393 |
<tr>
|
394 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
395 |
<td>IQ2_XXS</td>
|
396 |
<td style="text-align: right;">6.55 GB</td>
|
397 |
<td>Very low quality, uses SOTA techniques to be usable.</td>
|
398 |
</tr>
|
399 |
<tr>
|
400 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
401 |
<td>IQ2_XS</td>
|
402 |
<td style="text-align: right;">7.21 GB</td>
|
403 |
<td>Low quality, uses SOTA techniques to be usable.</td>
|
404 |
</tr>
|
405 |
<tr>
|
406 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
407 |
<td>IQ2_S</td>
|
408 |
<td style="text-align: right;">7.48 GB</td>
|
409 |
<td>Low quality, uses SOTA techniques to be usable.</td>
|
410 |
</tr>
|
411 |
<tr>
|
412 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
413 |
<td>IQ2_M</td>
|
414 |
<td style="text-align: right;">8.11 GB</td>
|
415 |
<td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
|
416 |
</tr>
|
417 |
<tr>
|
418 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
419 |
<td>Q2_K</td>
|
420 |
<td style="text-align: right;">8.89 GB</td>
|
421 |
<td>Very low quality but surprisingly usable.</td>
|
422 |
</tr>
|
423 |
<tr>
|
424 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
425 |
<td>IQ3_XXS</td>
|
426 |
<td style="text-align: right;">9.28 GB</td>
|
427 |
<td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
|
428 |
</tr>
|
429 |
<tr>
|
430 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
431 |
<td>Q2_K_L</td>
|
432 |
<td style="text-align: right;">9.55 GB</td>
|
433 |
<td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
|
434 |
</tr>
|
435 |
<tr>
|
436 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
437 |
<td>IQ3_XS</td>
|
438 |
<td style="text-align: right;">9.91 GB</td>
|
439 |
<td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
|
440 |
</tr>
|
441 |
<tr>
|
442 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
443 |
<td>IQ3_S</td>
|
444 |
<td style="text-align: right;">10.4 GB</td>
|
445 |
<td>Lower quality, slightly better than IQ3_XS.</td>
|
446 |
</tr>
|
447 |
<tr>
|
448 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
449 |
<td>Q3_K_S</td>
|
450 |
<td style="text-align: right;">10.4 GB</td>
|
451 |
<td>Low quality, not recommended.</td>
|
452 |
</tr>
|
453 |
<tr>
|
454 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
455 |
<td>IQ3_M</td>
|
456 |
<td style="text-align: right;">10.7 GB</td>
|
457 |
<td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
|
458 |
</tr>
|
459 |
<tr>
|
460 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
461 |
<td>Q3_K_M</td>
|
462 |
<td style="text-align: right;">11.5 GB</td>
|
463 |
<td>Lower quality but usable, good for low RAM availability.</td>
|
464 |
</tr>
|
465 |
<tr>
|
466 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
467 |
<td>Q3_K_L</td>
|
468 |
<td style="text-align: right;">12.4 GB</td>
|
469 |
<td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
|
470 |
</tr>
|
471 |
<tr>
|
472 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
473 |
<td>IQ4_XS</td>
|
474 |
<td style="text-align: right;">12.8 GB</td>
|
475 |
<td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
|
476 |
</tr>
|
477 |
<tr>
|
478 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
479 |
<td>IQ4_NL</td>
|
480 |
<td style="text-align: right;">13.5 GB</td>
|
481 |
<td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
|
482 |
</tr>
|
483 |
<tr>
|
484 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
485 |
<td>Q4_0</td>
|
486 |
<td style="text-align: right;">13.5 GB</td>
|
487 |
<td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
|
488 |
</tr>
|
489 |
<tr>
|
490 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
491 |
<td>Q4_K_S</td>
|
492 |
<td style="text-align: right;">13.5 GB</td>
|
493 |
<td>Slightly lower quality with more space savings, recommended.</td>
|
494 |
</tr>
|
495 |
<tr>
|
496 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
497 |
<td>Q4_K_M</td>
|
498 |
<td style="text-align: right;">14.3 GB</td>
|
499 |
<td>Good quality, default size for most use cases, recommended.</td>
|
500 |
</tr>
|
501 |
<tr>
|
502 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
503 |
<td>Q4_K_L</td>
|
504 |
<td style="text-align: right;">14.8 GB</td>
|
505 |
<td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
|
506 |
</tr>
|
507 |
<tr>
|
508 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
509 |
<td>Q4_1</td>
|
510 |
<td style="text-align: right;">14.9 GB</td>
|
511 |
<td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
|
512 |
</tr>
|
513 |
<tr>
|
514 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
515 |
<td>Q5_K_S</td>
|
516 |
<td style="text-align: right;">16.3 GB</td>
|
517 |
<td>High quality, recommended.</td>
|
518 |
</tr>
|
519 |
<tr>
|
520 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
521 |
<td>Q5_K_M</td>
|
522 |
<td style="text-align: right;">16.8 GB</td>
|
523 |
<td>High quality, recommended.</td>
|
524 |
</tr>
|
525 |
<tr>
|
526 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
527 |
<td>Q5_K_L</td>
|
528 |
<td style="text-align: right;">17.2 GB</td>
|
529 |
<td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
|
530 |
</tr>
|
531 |
<tr>
|
532 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
533 |
<td>Q6_K</td>
|
534 |
<td style="text-align: right;">19.3 GB</td>
|
535 |
<td>Very high quality, near perfect, recommended.</td>
|
536 |
</tr>
|
537 |
<tr>
|
538 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
539 |
<td>Q6_K_L</td>
|
540 |
<td style="text-align: right;">19.7 GB</td>
|
541 |
<td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
|
542 |
</tr>
|
543 |
<tr>
|
544 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2
|
545 |
<td>Q8_0</td>
|
546 |
<td style="text-align: right;">25.1 GB</td>
|
547 |
<td>Extremely high quality, generally unneeded but max available quant.</td>
|
548 |
</tr>
|
549 |
<tr>
|
550 |
-
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2.
|
|
|
|
|
|
|
|
|
|
|
|
|
551 |
<td>FP16</td>
|
552 |
<td style="text-align: right;">47.2 GB</td>
|
553 |
-
<td>Full BF16 weights.</td>
|
554 |
</tr>
|
555 |
<tr>
|
556 |
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2/tree/main">SAFE</a></td>
|
@@ -563,6 +569,10 @@ a:hover {
|
|
563 |
|
564 |
<p>If you need a quant that isn't uploaded you can open a request.</p>
|
565 |
|
|
|
|
|
|
|
|
|
566 |
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
|
567 |
<img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
|
568 |
And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>
|
|
|
379 |
</thead>
|
380 |
<tbody>
|
381 |
<tr>
|
382 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ1_S.gguf">GGUF</a></td>
|
383 |
<td>IQ1_S</td>
|
384 |
<td style="text-align: right;">5.27 GB</td>
|
385 |
<td>Lowest quality, uses SOTA techniques to be usable.</td>
|
386 |
</tr>
|
387 |
<tr>
|
388 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ1_M.gguf">GGUF</a></td>
|
389 |
<td>IQ1_M</td>
|
390 |
<td style="text-align: right;">5.75 GB</td>
|
391 |
<td>Extremely low quality, uses SOTA techniques to be usable.</td>
|
392 |
</tr>
|
393 |
<tr>
|
394 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_XXS.gguf">GGUF</a></td>
|
395 |
<td>IQ2_XXS</td>
|
396 |
<td style="text-align: right;">6.55 GB</td>
|
397 |
<td>Very low quality, uses SOTA techniques to be usable.</td>
|
398 |
</tr>
|
399 |
<tr>
|
400 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_XS.gguf">GGUF</a></td>
|
401 |
<td>IQ2_XS</td>
|
402 |
<td style="text-align: right;">7.21 GB</td>
|
403 |
<td>Low quality, uses SOTA techniques to be usable.</td>
|
404 |
</tr>
|
405 |
<tr>
|
406 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_S.gguf">GGUF</a></td>
|
407 |
<td>IQ2_S</td>
|
408 |
<td style="text-align: right;">7.48 GB</td>
|
409 |
<td>Low quality, uses SOTA techniques to be usable.</td>
|
410 |
</tr>
|
411 |
<tr>
|
412 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ2_M.gguf">GGUF</a></td>
|
413 |
<td>IQ2_M</td>
|
414 |
<td style="text-align: right;">8.11 GB</td>
|
415 |
<td>Relatively low quality, uses SOTA techniques to be surprisingly usable.</td>
|
416 |
</tr>
|
417 |
<tr>
|
418 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q2_K.gguf">GGUF</a></td>
|
419 |
<td>Q2_K</td>
|
420 |
<td style="text-align: right;">8.89 GB</td>
|
421 |
<td>Very low quality but surprisingly usable.</td>
|
422 |
</tr>
|
423 |
<tr>
|
424 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_XXS.gguf">GGUF</a></td>
|
425 |
<td>IQ3_XXS</td>
|
426 |
<td style="text-align: right;">9.28 GB</td>
|
427 |
<td>Lower quality, new method with decent performance, comparable to Q3 quants.</td>
|
428 |
</tr>
|
429 |
<tr>
|
430 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q2_K_L.gguf">GGUF</a></td>
|
431 |
<td>Q2_K_L</td>
|
432 |
<td style="text-align: right;">9.55 GB</td>
|
433 |
<td>Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.</td>
|
434 |
</tr>
|
435 |
<tr>
|
436 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_XS.gguf">GGUF</a></td>
|
437 |
<td>IQ3_XS</td>
|
438 |
<td style="text-align: right;">9.91 GB</td>
|
439 |
<td>Lower quality, new method with decent performance, slightly better than Q3_K_S.</td>
|
440 |
</tr>
|
441 |
<tr>
|
442 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_S.gguf">GGUF</a></td>
|
443 |
<td>IQ3_S</td>
|
444 |
<td style="text-align: right;">10.4 GB</td>
|
445 |
<td>Lower quality, slightly better than IQ3_XS.</td>
|
446 |
</tr>
|
447 |
<tr>
|
448 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_S.gguf">GGUF</a></td>
|
449 |
<td>Q3_K_S</td>
|
450 |
<td style="text-align: right;">10.4 GB</td>
|
451 |
<td>Low quality, not recommended.</td>
|
452 |
</tr>
|
453 |
<tr>
|
454 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ3_M.gguf">GGUF</a></td>
|
455 |
<td>IQ3_M</td>
|
456 |
<td style="text-align: right;">10.7 GB</td>
|
457 |
<td>Medium-low quality, new method with decent performance comparable to Q3_K_M.</td>
|
458 |
</tr>
|
459 |
<tr>
|
460 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_M.gguf">GGUF</a></td>
|
461 |
<td>Q3_K_M</td>
|
462 |
<td style="text-align: right;">11.5 GB</td>
|
463 |
<td>Lower quality but usable, good for low RAM availability.</td>
|
464 |
</tr>
|
465 |
<tr>
|
466 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q3_K_L.gguf">GGUF</a></td>
|
467 |
<td>Q3_K_L</td>
|
468 |
<td style="text-align: right;">12.4 GB</td>
|
469 |
<td>Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.</td>
|
470 |
</tr>
|
471 |
<tr>
|
472 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ4_XS.gguf">GGUF</a></td>
|
473 |
<td>IQ4_XS</td>
|
474 |
<td style="text-align: right;">12.8 GB</td>
|
475 |
<td>Decent quality, smaller than Q4_K_S with similar performance, recommended.</td>
|
476 |
</tr>
|
477 |
<tr>
|
478 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-IQ4_NL.gguf">GGUF</a></td>
|
479 |
<td>IQ4_NL</td>
|
480 |
<td style="text-align: right;">13.5 GB</td>
|
481 |
<td>Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.</td>
|
482 |
</tr>
|
483 |
<tr>
|
484 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_0.gguf">GGUF</a></td>
|
485 |
<td>Q4_0</td>
|
486 |
<td style="text-align: right;">13.5 GB</td>
|
487 |
<td>Legacy format, offers online repacking for ARM and AVX CPU inference.</td>
|
488 |
</tr>
|
489 |
<tr>
|
490 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_S.gguf">GGUF</a></td>
|
491 |
<td>Q4_K_S</td>
|
492 |
<td style="text-align: right;">13.5 GB</td>
|
493 |
<td>Slightly lower quality with more space savings, recommended.</td>
|
494 |
</tr>
|
495 |
<tr>
|
496 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_M.gguf">GGUF</a></td>
|
497 |
<td>Q4_K_M</td>
|
498 |
<td style="text-align: right;">14.3 GB</td>
|
499 |
<td>Good quality, default size for most use cases, recommended.</td>
|
500 |
</tr>
|
501 |
<tr>
|
502 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_K_L.gguf">GGUF</a></td>
|
503 |
<td>Q4_K_L</td>
|
504 |
<td style="text-align: right;">14.8 GB</td>
|
505 |
<td>Uses Q8_0 for embed and output weights. Good quality, recommended.</td>
|
506 |
</tr>
|
507 |
<tr>
|
508 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q4_1.gguf">GGUF</a></td>
|
509 |
<td>Q4_1</td>
|
510 |
<td style="text-align: right;">14.9 GB</td>
|
511 |
<td>Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.</td>
|
512 |
</tr>
|
513 |
<tr>
|
514 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_S.gguf">GGUF</a></td>
|
515 |
<td>Q5_K_S</td>
|
516 |
<td style="text-align: right;">16.3 GB</td>
|
517 |
<td>High quality, recommended.</td>
|
518 |
</tr>
|
519 |
<tr>
|
520 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_M.gguf">GGUF</a></td>
|
521 |
<td>Q5_K_M</td>
|
522 |
<td style="text-align: right;">16.8 GB</td>
|
523 |
<td>High quality, recommended.</td>
|
524 |
</tr>
|
525 |
<tr>
|
526 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q5_K_L.gguf">GGUF</a></td>
|
527 |
<td>Q5_K_L</td>
|
528 |
<td style="text-align: right;">17.2 GB</td>
|
529 |
<td>Uses Q8_0 for embed and output weights. High quality, recommended.</td>
|
530 |
</tr>
|
531 |
<tr>
|
532 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q6_K.gguf">GGUF</a></td>
|
533 |
<td>Q6_K</td>
|
534 |
<td style="text-align: right;">19.3 GB</td>
|
535 |
<td>Very high quality, near perfect, recommended.</td>
|
536 |
</tr>
|
537 |
<tr>
|
538 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q6_K_L.gguf">GGUF</a></td>
|
539 |
<td>Q6_K_L</td>
|
540 |
<td style="text-align: right;">19.7 GB</td>
|
541 |
<td>Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.</td>
|
542 |
</tr>
|
543 |
<tr>
|
544 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-Q8_0.gguf">GGUF</a></td>
|
545 |
<td>Q8_0</td>
|
546 |
<td style="text-align: right;">25.1 GB</td>
|
547 |
<td>Extremely high quality, generally unneeded but max available quant.</td>
|
548 |
</tr>
|
549 |
<tr>
|
550 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-UD-Q8_K_XL.gguf">GGUF</a></td>
|
551 |
+
<td>Q8_K_XL</td>
|
552 |
+
<td style="text-align: right;">29 GB</td>
|
553 |
+
<td>Uses FP16 for embed and output weights via Unsloth Dynamic 2.0, near perfect quality.</td>
|
554 |
+
</tr>
|
555 |
+
<tr>
|
556 |
+
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2-GGUF/resolve/main/Cthulhu-24B-v1.2-FP16.gguf">GGUF</a></td>
|
557 |
<td>FP16</td>
|
558 |
<td style="text-align: right;">47.2 GB</td>
|
559 |
+
<td>Full BF16 weights, maximum quality.</td>
|
560 |
</tr>
|
561 |
<tr>
|
562 |
<td><a href="https://huggingface.co/Fentible/Cthulhu-24B-v1.2/tree/main">SAFE</a></td>
|
|
|
569 |
|
570 |
<p>If you need a quant that isn't uploaded you can open a request.</p>
|
571 |
|
572 |
+
<p>Here is a useful tool which allows you to recreate UD quants: <a href="https://github.com/electroglyph/quant_clone">https://github.com/electroglyph/quant_clone</a></p>
|
573 |
+
|
574 |
+
<img src="https://i.imgur.com/YnTHoO1.png" width="800"></img>
|
575 |
+
|
576 |
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
|
577 |
<img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png"></img>
|
578 |
And here are Artefact2's thoughts on the matter: <a href="https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9">https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9</a>
|