--- base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - mistral - trl --- # Uploaded model - **Developed by:** EpistemeAI2 - **License:** apache-2.0 - **Finetuned from model :** unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) # Model Card for EpistemeAI2's Fireball-Mistral-Nemo-Instruct-emo-PHD, fine tuned Mistral-Nemo-Instruct-2407 The EpistemeAI2's Fireball-Mistral-Nemo-Instruct-emo-PHD , fine tuned Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/). ## Key features - Released under the **Apache 2 License** - Pre-trained and instructed versions - Trained with a **128k context window** - Trained on a large proportion of **multilingual and code data** - Drop-in replacement of Mistral 7B ## Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - **Layers:** 40 - **Dim:** 5,120 - **Head dim:** 128 - **Hidden dim:** 14,336 - **Activation Function:** SwiGLU - **Number of heads:** 32 - **Number of kv-heads:** 8 (GQA) - **Vocabulary size:** 2**17 ~= 128k - **Rotary embeddings (theta = 1M)** ## Training data Fireball-Mistral-Nemo-Instruct-emo-PHD is fine tuned by **simulated-emotions and philsophy in deduction reasoning, math and science** dataset ### Mistral Inference #### Install ``` pip install mistral_inference ``` #### Download ```py from huggingface_hub import snapshot_download from pathlib import Path mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-Instruct') mistral_models_path.mkdir(parents=True, exist_ok=True) snapshot_download(repo_id="EpistemeAI2/Fireball-Mistral-Nemo-Instruct-emo-PHD", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path) ``` ### Transformers > [!IMPORTANT] > NOTE: Until a new release has been made, you need to install transformers from source: > ```sh > pip install git+https://github.com/huggingface/transformers.git > ``` If you want to use Hugging Face `transformers` to generate text, you can do something like this. ```py from transformers import pipeline messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] chatbot = pipeline("text-generation", model="EpistemeAI2/Fireball-Mistral-Nemo-Instruct-emo-PHD",max_new_tokens=128) chatbot(messages) ``` ## Function calling with `transformers` To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling) in the `transformers` docs for more information. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "EpistemeAI2/Fireball-Mistral-Nemo-Instruct-emo-PHD" tokenizer = AutoTokenizer.from_pretrained(model_id) def get_current_weather(location: str, format: str): """ Get the current weather Args: location: The city and state, e.g. San Francisco, CA format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"]) """ pass conversation = [{"role": "user", "content": "What's the weather like in Paris?"}] tools = [get_current_weather] # format and tokenize the tool use prompt inputs = tokenizer.apply_chat_template( conversation, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt", ) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") inputs.to(model.device) outputs = model.generate(**inputs, max_new_tokens=1000) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling), and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be exactly 9 alphanumeric characters. > [!TIP] > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.