# llamafile [![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)
**llamafile lets you distribute and run LLMs with a single file. ([announcement blog post](https://hacks.mozilla.org/2023/11/introducing-llamafile/))** llamafile aims to make open LLMs much more accessible to both developers and end users. They're doing that by combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.

llamafile is a Mozilla Builders project. ## Quickstart The easiest way to try it for yourself is to download the example llamafile for the [numind.NuExtract](numind/NuExtract-1.5) model (license: [mit], [OpenAI](https://openai.com/policies/terms-of-use)). With llamafile, this you can run this model locally while consuming comparitively less resources and having better performance in CPU alone. 1. Download [numind.NuExtract-v1.5.Q5_K_M.llamafile](https://huggingface.co/Devarui379/numind.NuExtract-v1.5-Q5_K_M-llamafile/resolve/main/numind.NuExtract-v1.5.Q5_K_M.llamafile?download=true) (2.78 GB). 2. Open your computer's terminal. 3. If you're using macOS, Linux, or BSD, you'll need to grant permission for your computer to execute this new file. (You only need to do this once.) ```sh chmod +x numind.NuExtract-v1.5.Q5_K_M.llamafile ``` 4. If you're on Windows, rename the file by adding ".exe" on the end. 5. Run the llamafile. e.g.: ```sh ./numind.NuExtract-v1.5.Q5_K_M.llamafile ``` 6. Your browser should open automatically and display a chat interface. (If it doesn't, just open your browser and point it at http://localhost:8080) 7. When you're done chatting, return to your terminal and hit `Control-C` to shut down llamafile. **Having trouble? See the "Gotchas" section in the official github page of [llamafile](https://github.com/Mozilla-Ocho/llamafile) ** ## Distribution One good way to share a llamafile with your friends is by posting it on Hugging Face. If you do that, then it's recommended that you mention in your Hugging Face commit message what git revision or released version of llamafile you used when building your llamafile. That way everyone online will be able verify the provenance of its executable content. If you've made changes to the llama.cpp or cosmopolitan source code, then the Apache 2.0 license requires you to explain what changed. One way you can do that is by embedding a notice in your llamafile using `zipalign` that describes the changes, and mention it in your Hugging Face commit. ## Documentation There's a manual page for each of the llamafile programs installed when you run `sudo make install`. The command manuals are also typeset as PDF files that you can download from the GitHub releases page. Lastly, most commands will display that information when passing the `--help` flag. ## Running llamafile with models downloaded by third-party applications This section answers the question *"I already have a model downloaded locally by application X, can I use it with llamafile?"*. The general answer is "yes, as long as those models are locally stored in GGUF format" but its implementation can be more or less hacky depending on the application. A few examples (tested on a Mac) follow. ### LM Studio [LM Studio](https://lmstudio.ai/) stores downloaded models in `~/.cache/lm-studio/models`, in subdirectories with the same name of the models (following HuggingFace's `account_name/model_name` format), with the same filename you saw when you chose to download the file. So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows: ``` cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF llamafile -m llama-2-7b.Q2_K.gguf ``` ### Ollama When you download a new model with [ollama](https://ollama.com), all its metadata will be stored in a manifest file under `~/.ollama/models/manifests/registry.ollama.ai/library/`. The directory and manifest file name are the model name as returned by `ollama list`. For instance, for `llama3:latest` the manifest file will be named `.ollama/models/manifests/registry.ollama.ai/library/llama3/latest`. The manifest maps each file related to the model (e.g. GGUF weights, license, prompt template, etc) to a sha256 digest. The digest corresponding to the element whose `mediaType` is `application/vnd.ollama.image.model` is the one referring to the model's GGUF file. Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see *only* those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows: ``` cd ~/.ollama/models/blobs llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 ``` ## Security llamafile adds pledge() and SECCOMP sandboxing to llama.cpp. This is enabled by default. It can be turned off by passing the `--unsecure` flag. Sandboxing is currently only supported on Linux and OpenBSD on systems without GPUs; on other platforms it'll simply log a warning. Our approach to security has these benefits: 1. After it starts up, your HTTP server isn't able to access the filesystem at all. This is good, since it means if someone discovers a bug in the llama.cpp server, then it's much less likely they'll be able to access sensitive information on your machine or make changes to its configuration. On Linux, we're able to sandbox things even further; the only networking related system call the HTTP server will allowed to use after starting up, is accept(). That further limits an attacker's ability to exfiltrate information, in the event that your HTTP server is compromised. 2. The main CLI command won't be able to access the network at all. This is enforced by the operating system kernel. It also won't be able to write to the file system. This keeps your computer safe in the event that a bug is ever discovered in the GGUF file format that lets an attacker craft malicious weights files and post them online. The only exception to this rule is if you pass the `--prompt-cache` flag without also specifying `--prompt-cache-ro`. In that case, security currently needs to be weakened to allow `cpath` and `wpath` access, but network access will remain forbidden. Therefore your llamafile is able to protect itself against the outside world, but that doesn't mean you're protected from llamafile. Sandboxing is self-imposed. If you obtained your llamafile from an untrusted source then its author could have simply modified it to not do that. In that case, you can run the untrusted llamafile inside another sandbox, such as a virtual machine, to make sure it behaves how you expect. ## Licensing While the llamafile project is Apache 2.0-licensed, the changes to llama.cpp are licensed under MIT (just like the llama.cpp project itself) so as to remain compatible and upstreamable in the future, should that be desired. [![Star History Chart](https://api.star-history.com/svg?repos=Mozilla-Ocho/llamafile&type=Date)](https://star-history.com/#Mozilla-Ocho/llamafile&Date)