|
# llamafile |
|
|
|
[](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/> |
|
|
|
|
|
**llamafile lets you distribute and run LLMs with a single file. ([announcement blog post](https://hacks.mozilla.org/2023/11/introducing-llamafile/))** |
|
|
|
llamafile aims to make open LLMs much more |
|
accessible to both developers and end users. They're doing that by |
|
combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one |
|
framework that collapses all the complexity of LLMs down to |
|
a single-file executable (called a "llamafile") that runs |
|
locally on most computers, with no installation.<br/><br/> |
|
|
|
llamafile is a Mozilla Builders project. |
|
## Quickstart |
|
|
|
The easiest way to try it for yourself is to download the example |
|
llamafile for the [numind.NuExtract](numind/NuExtract-1.5) model (license: [mit], |
|
[OpenAI](https://openai.com/policies/terms-of-use)). With llamafile, this you can run this model locally while consuming comparitively less resources and having better performance in CPU alone. |
|
|
|
1. Download [numind.NuExtract-v1.5.Q5_K_M.llamafile](https://huggingface.co/Devarui379/numind.NuExtract-v1.5-Q5_K_M-llamafile/resolve/main/numind.NuExtract-v1.5.Q5_K_M.llamafile?download=true) (2.78 GB). |
|
|
|
2. Open your computer's terminal. |
|
|
|
3. If you're using macOS, Linux, or BSD, you'll need to grant permission |
|
for your computer to execute this new file. (You only need to do this |
|
once.) |
|
|
|
```sh |
|
chmod +x numind.NuExtract-v1.5.Q5_K_M.llamafile |
|
``` |
|
|
|
4. If you're on Windows, rename the file by adding ".exe" on the end. |
|
|
|
5. Run the llamafile. e.g.: |
|
|
|
```sh |
|
./numind.NuExtract-v1.5.Q5_K_M.llamafile |
|
``` |
|
|
|
6. Your browser should open automatically and display a chat interface. |
|
(If it doesn't, just open your browser and point it at http://localhost:8080) |
|
|
|
7. When you're done chatting, return to your terminal and hit |
|
`Control-C` to shut down llamafile. |
|
|
|
**Having trouble? See the "Gotchas" section in the official github page of [llamafile](https://github.com/Mozilla-Ocho/llamafile) ** |
|
|
|
## Distribution |
|
|
|
One good way to share a llamafile with your friends is by posting it on |
|
Hugging Face. If you do that, then it's recommended that you mention in |
|
your Hugging Face commit message what git revision or released version |
|
of llamafile you used when building your llamafile. That way everyone |
|
online will be able verify the provenance of its executable content. If |
|
you've made changes to the llama.cpp or cosmopolitan source code, then |
|
the Apache 2.0 license requires you to explain what changed. One way you |
|
can do that is by embedding a notice in your llamafile using `zipalign` |
|
that describes the changes, and mention it in your Hugging Face commit. |
|
|
|
## Documentation |
|
|
|
There's a manual page for each of the llamafile programs installed when you |
|
run `sudo make install`. The command manuals are also typeset as PDF |
|
files that you can download from the GitHub releases page. Lastly, most |
|
commands will display that information when passing the `--help` flag. |
|
|
|
## Running llamafile with models downloaded by third-party applications |
|
|
|
This section answers the question *"I already have a model downloaded locally by application X, can I use it with llamafile?"*. The general answer is "yes, as long as those models are locally stored in GGUF format" but its implementation can be more or less hacky depending on the application. A few examples (tested on a Mac) follow. |
|
|
|
### LM Studio |
|
[LM Studio](https://lmstudio.ai/) stores downloaded models in `~/.cache/lm-studio/models`, in subdirectories with the same name of the models (following HuggingFace's `account_name/model_name` format), with the same filename you saw when you chose to download the file. |
|
|
|
So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows: |
|
|
|
``` |
|
cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF |
|
llamafile -m llama-2-7b.Q2_K.gguf |
|
``` |
|
|
|
### Ollama |
|
|
|
When you download a new model with [ollama](https://ollama.com), all its metadata will be stored in a manifest file under `~/.ollama/models/manifests/registry.ollama.ai/library/`. The directory and manifest file name are the model name as returned by `ollama list`. For instance, for `llama3:latest` the manifest file will be named `.ollama/models/manifests/registry.ollama.ai/library/llama3/latest`. |
|
|
|
The manifest maps each file related to the model (e.g. GGUF weights, license, prompt template, etc) to a sha256 digest. The digest corresponding to the element whose `mediaType` is `application/vnd.ollama.image.model` is the one referring to the model's GGUF file. |
|
|
|
Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see *only* those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows: |
|
|
|
``` |
|
cd ~/.ollama/models/blobs |
|
llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 |
|
``` |
|
|
|
|
|
|
|
## Security |
|
|
|
llamafile adds pledge() and SECCOMP sandboxing to llama.cpp. This is |
|
enabled by default. It can be turned off by passing the `--unsecure` |
|
flag. Sandboxing is currently only supported on Linux and OpenBSD on |
|
systems without GPUs; on other platforms it'll simply log a warning. |
|
|
|
Our approach to security has these benefits: |
|
|
|
1. After it starts up, your HTTP server isn't able to access the |
|
filesystem at all. This is good, since it means if someone discovers |
|
a bug in the llama.cpp server, then it's much less likely they'll be |
|
able to access sensitive information on your machine or make changes |
|
to its configuration. On Linux, we're able to sandbox things even |
|
further; the only networking related system call the HTTP server will |
|
allowed to use after starting up, is accept(). That further limits an |
|
attacker's ability to exfiltrate information, in the event that your |
|
HTTP server is compromised. |
|
|
|
2. The main CLI command won't be able to access the network at all. This |
|
is enforced by the operating system kernel. It also won't be able to |
|
write to the file system. This keeps your computer safe in the event |
|
that a bug is ever discovered in the GGUF file format that lets |
|
an attacker craft malicious weights files and post them online. The |
|
only exception to this rule is if you pass the `--prompt-cache` flag |
|
without also specifying `--prompt-cache-ro`. In that case, security |
|
currently needs to be weakened to allow `cpath` and `wpath` access, |
|
but network access will remain forbidden. |
|
|
|
Therefore your llamafile is able to protect itself against the outside |
|
world, but that doesn't mean you're protected from llamafile. Sandboxing |
|
is self-imposed. If you obtained your llamafile from an untrusted source |
|
then its author could have simply modified it to not do that. In that |
|
case, you can run the untrusted llamafile inside another sandbox, such |
|
as a virtual machine, to make sure it behaves how you expect. |
|
|
|
## Licensing |
|
|
|
While the llamafile project is Apache 2.0-licensed, the changes |
|
to llama.cpp are licensed under MIT (just like the llama.cpp project |
|
itself) so as to remain compatible and upstreamable in the future, |
|
should that be desired. |
|
|
|
|
|
|
|
[](https://star-history.com/#Mozilla-Ocho/llamafile&Date) |
|
|