jartine
/

c4ai-command-r-plus-llamafile

Text Generation

Inference Endpoints

Model card Files Files and versions Community

jartine commited on Aug 20

Commit

224f275

•

1 Parent(s): da37751

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ For further information, please see the [llamafile
 README](https://github.com/mozilla-ocho/llamafile/).
 Having **trouble?** See the ["Gotchas"
-section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
 of the README.
 ## About Upload Limits
@@ -117,7 +117,7 @@ Your choice of quantization format depends on three things:
 1. Will it fit in RAM or VRAM?
 2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
-3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
 Good quants for writing (prediction speed) are Q5\_K\_M, and Q4\_0. Text
 generation is bounded by memory speed, so smaller quants help, but they

 README](https://github.com/mozilla-ocho/llamafile/).
 Having **trouble?** See the ["Gotchas"
+section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
 of the README.
 ## About Upload Limits
 1. Will it fit in RAM or VRAM?
 2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
+3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting))
 Good quants for writing (prediction speed) are Q5\_K\_M, and Q4\_0. Text
 generation is bounded by memory speed, so smaller quants help, but they