--- language: - en license: llama2 tags: - text generation - instruct datasets: - PygmalionAI/PIPPA - Open-Orca/OpenOrca - Norquinal/claude_multiround_chat_30k - jondurbin/airoboros-gpt4-1.4.1 - databricks/databricks-dolly-15k model_name: Mythalion 13B base_model: PygmalionAI/mythalion-13b inference: false model_creator: PygmalionAI model_type: llama pipeline_tag: text-generation prompt_template: 'Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response: ' quantized_by: Core Tech Logic Srl --- # Mythalion 13B - GGUF - Model creator: [PygmalionAI](https://huggingface.co/PygmalionAI) - Original model: [Mythalion 13B](https://huggingface.co/PygmalionAI/mythalion-13b) ## Description This repo contains GGUF model files for [PygmalionAI's Mythalion 13B](https://huggingface.co/PygmalionAI/mythalion-13b) used in [LoveCore AI] (https://lovecoreai.com). ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible. Here is an incomplate list of clients and libraries that are known to support GGUF: * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option. * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection. * [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use. ## Prompt template: Alpaca ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response: ``` ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) They are also compatible with many third party UIs and libraries - please see the list at the top of this README. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
### On the command line, including multiple files at once I recommend using the `huggingface-hub` Python library: ```shell pip3 install huggingface-hub>=0.17.1 ``` Then you can download any individual model file to the current directory, at high speed, with a command like this: ```shell huggingface-cli download ekchatzi-core/Mythalion-13B-GGUF mythalion-13b-Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False ```
More advanced huggingface-cli download usage You can also download multiple files at once with a pattern: ```shell huggingface-cli download ekchatzi-core/Mythalion-13B-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf' ``` For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli). To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`: ```shell pip3 install hf_transfer ``` And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`: ```shell HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ekchatzi-core/Mythalion-13B-GGUF mythalion-13b-Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False ``` Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
## Example `llama.cpp` command Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later. ```shell ./main -ngl 32 -m mythalion-13b-Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:" ``` Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change `-c 4096` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. If you want to have a chat-style conversation, replace the `-p ` argument with `-i -ins` For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md) ## How to run in `text-generation-webui` Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md). ## How to run from Python code You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries. ### How to load this model from Python using ctransformers #### First install the package ```bash # Base ctransformers with no GPU acceleration pip install ctransformers>=0.2.24 # Or with CUDA GPU acceleration pip install ctransformers[cuda]>=0.2.24 # Or with ROCm GPU acceleration CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers # Or with Metal GPU acceleration for macOS systems CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers ``` #### Simple example code to load one of these GGUF models ```python from ctransformers import AutoModelForCausalLM # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system. llm = AutoModelForCausalLM.from_pretrained("ekchatzi-core/Mythalion-13B-GGUF", model_file="mythalion-13b-Q4_K_M.gguf", model_type="llama", gpu_layers=50) print(llm("AI is going to")) ``` ## How to use with LangChain Here's guides on using llama-cpp-python or ctransformers with LangChain: * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp) * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers) # Original model card: PygmalionAI's Mythalion 13B

Mythalion 13B

A merge of Pygmalion-2 13B and MythoMax 13B

## Model Details The long-awaited release of our new models based on Llama-2 is finally here. This model was created in collaboration with [Gryphe](https://huggingface.co/Gryphe), a mixture of our [Pygmalion-2 13B](https://huggingface.co/PygmalionAI/pygmalion-2-13b) and Gryphe's [Mythomax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b). Finer details of the merge are available in [our blogpost](https://pygmalionai.github.io/blog/posts/introducing_pygmalion_2/#mythalion-13b). According to our testers, this model seems to outperform MythoMax in RP/Chat. **Please make sure you follow the recommended generation settings for SillyTavern [here](https://pygmalionai.github.io/blog/posts/introducing_pygmalion_2/#sillytavern) for the best results!** This model is freely available for both commercial and non-commercial use, as per the Llama-2 license. ## Prompting This model can be prompted using both the Alpaca and [Pygmalion formatting](https://huggingface.co/PygmalionAI/pygmalion-2-13b#prompting). **Alpaca formatting**: ``` ### Instruction: ### Response: ``` **Pygmalion/Metharme formatting**: ``` <|system|>Enter RP mode. Pretend to be {{char}} whose persona follows: {{persona}} You shall reply to the user while staying in character, and generate long responses. <|user|>Hello!<|model|>{model's response goes here} ``` The model has been trained on prompts using three different roles, which are denoted by the following tokens: `<|system|>`, `<|user|>` and `<|model|>`. The `<|system|>` prompt can be used to inject out-of-channel information behind the scenes, while the `<|user|>` prompt should be used to indicate user input. The `<|model|>` token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to form a conversation history. ## Limitations and biases The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope. As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading. ## Acknowledgements We would like to thank [SpicyChat](https://spicychat.ai/) for sponsoring the training for the [Pygmalion-2 13B](https://huggingface.co/PygmalionAI/pygmalion-2-13b) model. [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)