Upload 2 files

3b5ca6f verified 6 months ago

7.62 kB

	# llamafile

	[![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>


	llamafile lets you distribute and run LLMs with a single file. ([announcement blog post](https://hacks.mozilla.org/2023/11/introducing-llamafile/))

	llamafile aims to make open LLMs much more
	accessible to both developers and end users. They're doing that by
	combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one
	framework that collapses all the complexity of LLMs down to
	a single-file executable (called a "llamafile") that runs
	locally on most computers, with no installation.<br/><br/>

	llamafile is a Mozilla Builders project.
	## Quickstart

	The easiest way to try it for yourself is to download the example
	llamafile for the [numind.NuExtract](numind/NuExtract-1.5) model (license: [mit],
	[OpenAI](https://openai.com/policies/terms-of-use)). With llamafile, this you can run this model locally while consuming comparitively less resources and having better performance in CPU alone.

	1. Download [numind.NuExtract-v1.5.Q5_K_M.llamafile](https://huggingface.co/Devarui379/numind.NuExtract-v1.5-Q5_K_M-llamafile/resolve/main/numind.NuExtract-v1.5.Q5_K_M.llamafile?download=true) (2.78 GB).

	2. Open your computer's terminal.

	3. If you're using macOS, Linux, or BSD, you'll need to grant permission
	for your computer to execute this new file. (You only need to do this
	once.)

	```sh
	chmod +x numind.NuExtract-v1.5.Q5_K_M.llamafile
	```

	4. If you're on Windows, rename the file by adding ".exe" on the end.

	5. Run the llamafile. e.g.:

	```sh
	./numind.NuExtract-v1.5.Q5_K_M.llamafile
	```

	6. Your browser should open automatically and display a chat interface.
	(If it doesn't, just open your browser and point it at http://localhost:8080)

	7. When you're done chatting, return to your terminal and hit
	`Control-C` to shut down llamafile.

	Having trouble? See the "Gotchas" section in the official github page of [llamafile](https://github.com/Mozilla-Ocho/llamafile)

	## Distribution

	One good way to share a llamafile with your friends is by posting it on
	Hugging Face. If you do that, then it's recommended that you mention in
	your Hugging Face commit message what git revision or released version
	of llamafile you used when building your llamafile. That way everyone
	online will be able verify the provenance of its executable content. If
	you've made changes to the llama.cpp or cosmopolitan source code, then
	the Apache 2.0 license requires you to explain what changed. One way you
	can do that is by embedding a notice in your llamafile using `zipalign`
	that describes the changes, and mention it in your Hugging Face commit.

	## Documentation

	There's a manual page for each of the llamafile programs installed when you
	run `sudo make install`. The command manuals are also typeset as PDF
	files that you can download from the GitHub releases page. Lastly, most
	commands will display that information when passing the `--help` flag.

	## Running llamafile with models downloaded by third-party applications

	This section answers the question "I already have a model downloaded locally by application X, can I use it with llamafile?". The general answer is "yes, as long as those models are locally stored in GGUF format" but its implementation can be more or less hacky depending on the application. A few examples (tested on a Mac) follow.

	### LM Studio
	[LM Studio](https://lmstudio.ai/) stores downloaded models in `~/.cache/lm-studio/models`, in subdirectories with the same name of the models (following HuggingFace's `account_name/model_name` format), with the same filename you saw when you chose to download the file.

	So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows:

	```
	cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF
	llamafile -m llama-2-7b.Q2_K.gguf
	```

	### Ollama

	When you download a new model with [ollama](https://ollama.com), all its metadata will be stored in a manifest file under `~/.ollama/models/manifests/registry.ollama.ai/library/`. The directory and manifest file name are the model name as returned by `ollama list`. For instance, for `llama3:latest` the manifest file will be named `.ollama/models/manifests/registry.ollama.ai/library/llama3/latest`.

	The manifest maps each file related to the model (e.g. GGUF weights, license, prompt template, etc) to a sha256 digest. The digest corresponding to the element whose `mediaType` is `application/vnd.ollama.image.model` is the one referring to the model's GGUF file.

	Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see only those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows:

	```
	cd ~/.ollama/models/blobs
	llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
	```



	## Security

	llamafile adds pledge() and SECCOMP sandboxing to llama.cpp. This is
	enabled by default. It can be turned off by passing the `--unsecure`
	flag. Sandboxing is currently only supported on Linux and OpenBSD on
	systems without GPUs; on other platforms it'll simply log a warning.

	Our approach to security has these benefits:

	1. After it starts up, your HTTP server isn't able to access the
	filesystem at all. This is good, since it means if someone discovers
	a bug in the llama.cpp server, then it's much less likely they'll be
	able to access sensitive information on your machine or make changes
	to its configuration. On Linux, we're able to sandbox things even
	further; the only networking related system call the HTTP server will
	allowed to use after starting up, is accept(). That further limits an
	attacker's ability to exfiltrate information, in the event that your
	HTTP server is compromised.

	2. The main CLI command won't be able to access the network at all. This
	is enforced by the operating system kernel. It also won't be able to
	write to the file system. This keeps your computer safe in the event
	that a bug is ever discovered in the GGUF file format that lets
	an attacker craft malicious weights files and post them online. The
	only exception to this rule is if you pass the `--prompt-cache` flag
	without also specifying `--prompt-cache-ro`. In that case, security
	currently needs to be weakened to allow `cpath` and `wpath` access,
	but network access will remain forbidden.

	Therefore your llamafile is able to protect itself against the outside
	world, but that doesn't mean you're protected from llamafile. Sandboxing
	is self-imposed. If you obtained your llamafile from an untrusted source
	then its author could have simply modified it to not do that. In that
	case, you can run the untrusted llamafile inside another sandbox, such
	as a virtual machine, to make sure it behaves how you expect.

	## Licensing

	While the llamafile project is Apache 2.0-licensed, the changes
	to llama.cpp are licensed under MIT (just like the llama.cpp project
	itself) so as to remain compatible and upstreamable in the future,
	should that be desired.



	[![Star History Chart](https://api.star-history.com/svg?repos=Mozilla-Ocho/llamafile&type=Date)](https://star-history.com/#Mozilla-Ocho/llamafile&Date)