Upload README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -61,7 +61,7 @@ This repo contains GGUF format model files for [OpenAccess AI Collective's Minot | |
| 61 | 
             
            <!-- README_GGUF.md-about-gguf start -->
         | 
| 62 | 
             
            ### About GGUF
         | 
| 63 |  | 
| 64 | 
            -
            GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. | 
| 65 |  | 
| 66 | 
             
            Here is an incomplate list of clients and libraries that are known to support GGUF:
         | 
| 67 |  | 
| @@ -99,7 +99,7 @@ A chat between a curious user and an artificial intelligence assistant. The assi | |
| 99 | 
             
            <!-- compatibility_gguf start -->
         | 
| 100 | 
             
            ## Compatibility
         | 
| 101 |  | 
| 102 | 
            -
            These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [ | 
| 103 |  | 
| 104 | 
             
            They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
         | 
| 105 |  | 
| @@ -192,25 +192,25 @@ pip3 install hf_transfer | |
| 192 | 
             
            And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
         | 
| 193 |  | 
| 194 | 
             
            ```shell
         | 
| 195 | 
            -
             | 
| 196 | 
             
            ```
         | 
| 197 |  | 
| 198 | 
            -
            Windows Command Line users: You can set the environment variable by running `set  | 
| 199 | 
             
            </details>
         | 
| 200 | 
             
            <!-- README_GGUF.md-how-to-download end -->
         | 
| 201 |  | 
| 202 | 
             
            <!-- README_GGUF.md-how-to-run start -->
         | 
| 203 | 
             
            ## Example `llama.cpp` command
         | 
| 204 |  | 
| 205 | 
            -
            Make sure you are using `llama.cpp` from commit [ | 
| 206 |  | 
| 207 | 
             
            ```shell
         | 
| 208 | 
            -
            ./main -ngl 32 -m minotaur-13b.Q4_K_M.gguf --color -c  | 
| 209 | 
             
            ```
         | 
| 210 |  | 
| 211 | 
             
            Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
         | 
| 212 |  | 
| 213 | 
            -
            Change `-c  | 
| 214 |  | 
| 215 | 
             
            If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
         | 
| 216 |  | 
| @@ -224,22 +224,24 @@ Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://git | |
| 224 |  | 
| 225 | 
             
            You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
         | 
| 226 |  | 
| 227 | 
            -
            ### How to load this model  | 
| 228 |  | 
| 229 | 
             
            #### First install the package
         | 
| 230 |  | 
| 231 | 
            -
             | 
|  | |
|  | |
| 232 | 
             
            # Base ctransformers with no GPU acceleration
         | 
| 233 | 
            -
            pip install ctransformers | 
| 234 | 
             
            # Or with CUDA GPU acceleration
         | 
| 235 | 
            -
            pip install ctransformers[cuda] | 
| 236 | 
            -
            # Or with ROCm GPU acceleration
         | 
| 237 | 
            -
            CT_HIPBLAS=1 pip install ctransformers | 
| 238 | 
            -
            # Or with Metal GPU acceleration for macOS systems
         | 
| 239 | 
            -
            CT_METAL=1 pip install ctransformers | 
| 240 | 
             
            ```
         | 
| 241 |  | 
| 242 | 
            -
            #### Simple example code | 
| 243 |  | 
| 244 | 
             
            ```python
         | 
| 245 | 
             
            from ctransformers import AutoModelForCausalLM
         | 
| @@ -252,7 +254,7 @@ print(llm("AI is going to")) | |
| 252 |  | 
| 253 | 
             
            ## How to use with LangChain
         | 
| 254 |  | 
| 255 | 
            -
            Here | 
| 256 |  | 
| 257 | 
             
            * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
         | 
| 258 | 
             
            * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
         | 
|  | |
| 61 | 
             
            <!-- README_GGUF.md-about-gguf start -->
         | 
| 62 | 
             
            ### About GGUF
         | 
| 63 |  | 
| 64 | 
            +
            GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
         | 
| 65 |  | 
| 66 | 
             
            Here is an incomplate list of clients and libraries that are known to support GGUF:
         | 
| 67 |  | 
|  | |
| 99 | 
             
            <!-- compatibility_gguf start -->
         | 
| 100 | 
             
            ## Compatibility
         | 
| 101 |  | 
| 102 | 
            +
            These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
         | 
| 103 |  | 
| 104 | 
             
            They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
         | 
| 105 |  | 
|  | |
| 192 | 
             
            And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
         | 
| 193 |  | 
| 194 | 
             
            ```shell
         | 
| 195 | 
            +
            HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/minotaur-13B-fixed-GGUF minotaur-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
         | 
| 196 | 
             
            ```
         | 
| 197 |  | 
| 198 | 
            +
            Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
         | 
| 199 | 
             
            </details>
         | 
| 200 | 
             
            <!-- README_GGUF.md-how-to-download end -->
         | 
| 201 |  | 
| 202 | 
             
            <!-- README_GGUF.md-how-to-run start -->
         | 
| 203 | 
             
            ## Example `llama.cpp` command
         | 
| 204 |  | 
| 205 | 
            +
            Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
         | 
| 206 |  | 
| 207 | 
             
            ```shell
         | 
| 208 | 
            +
            ./main -ngl 32 -m minotaur-13b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:"
         | 
| 209 | 
             
            ```
         | 
| 210 |  | 
| 211 | 
             
            Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
         | 
| 212 |  | 
| 213 | 
            +
            Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
         | 
| 214 |  | 
| 215 | 
             
            If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
         | 
| 216 |  | 
|  | |
| 224 |  | 
| 225 | 
             
            You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
         | 
| 226 |  | 
| 227 | 
            +
            ### How to load this model in Python code, using ctransformers
         | 
| 228 |  | 
| 229 | 
             
            #### First install the package
         | 
| 230 |  | 
| 231 | 
            +
            Run one of the following commands, according to your system:
         | 
| 232 | 
            +
             | 
| 233 | 
            +
            ```shell
         | 
| 234 | 
             
            # Base ctransformers with no GPU acceleration
         | 
| 235 | 
            +
            pip install ctransformers
         | 
| 236 | 
             
            # Or with CUDA GPU acceleration
         | 
| 237 | 
            +
            pip install ctransformers[cuda]
         | 
| 238 | 
            +
            # Or with AMD ROCm GPU acceleration (Linux only)
         | 
| 239 | 
            +
            CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
         | 
| 240 | 
            +
            # Or with Metal GPU acceleration for macOS systems only
         | 
| 241 | 
            +
            CT_METAL=1 pip install ctransformers --no-binary ctransformers
         | 
| 242 | 
             
            ```
         | 
| 243 |  | 
| 244 | 
            +
            #### Simple ctransformers example code
         | 
| 245 |  | 
| 246 | 
             
            ```python
         | 
| 247 | 
             
            from ctransformers import AutoModelForCausalLM
         | 
|  | |
| 254 |  | 
| 255 | 
             
            ## How to use with LangChain
         | 
| 256 |  | 
| 257 | 
            +
            Here are guides on using llama-cpp-python and ctransformers with LangChain:
         | 
| 258 |  | 
| 259 | 
             
            * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
         | 
| 260 | 
             
            * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
         |