--- base_model: EpistemeAI/DeepPhi-3.5-mini-instruct tags: - text-generation-inference - transformers - unsloth - llama - trl - llama-cpp - gguf-my-repo license: mit language: - en --- # Triangle104/DeepPhi-3.5-mini-instruct-Q8_0-GGUF This model was converted to GGUF format from [`EpistemeAI/DeepPhi-3.5-mini-instruct`](https://huggingface.co/EpistemeAI/DeepPhi-3.5-mini-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. Refer to the [original model card](https://huggingface.co/EpistemeAI/DeepPhi-3.5-mini-instruct) for more details on the model. --- Model Summary - Reason Phi model for top performing model with it's size of 3.8B. Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. Run locally - 4bit After obtaining the Phi-3.5-mini-instruct model checkpoint, users can use this sample code for inference. import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig torch.random.manual_seed(0) model_path = "EpistemeAI/DeepPhi-3.5-mini-instruct" # Configure 4-bit quantization using bitsandbytes quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", # You can also try "fp4" if desired. bnb_4bit_compute_dtype=torch.float16 # Or torch.bfloat16 depending on your hardware. ) model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True, quantization_config=quantization_config, ) tokenizer = AutoTokenizer.from_pretrained(model_path) messages = [ {"role": "system", "content": """ You are a helpful AI assistant. Respond in the following format: ... ... """}, {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, {"role": "user", "content": "What about solving a 2x + 3 = 7 equation?"}, ] def format_messages(messages): prompt = "" for msg in messages: role = msg["role"].capitalize() prompt += f"{role}: {msg['content']}\n" return prompt.strip() prompt = format_messages(messages) pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, ) generation_args = { "max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(prompt, **generation_args) print(output[0]['generated_text']) Uploaded model - Developed by: EpistemeAI License: apache-2.0 Finetuned from model : unsloth/phi-3.5-mini-instruct-bnb-4bit This llama model was trained 2x faster with Unsloth and Huggingface's TRL library. --- ## Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` Invoke the llama.cpp server or the CLI. ### CLI: ```bash llama-cli --hf-repo Triangle104/DeepPhi-3.5-mini-instruct-Q8_0-GGUF --hf-file deepphi-3.5-mini-instruct-q8_0.gguf -p "The meaning to life and the universe is" ``` ### Server: ```bash llama-server --hf-repo Triangle104/DeepPhi-3.5-mini-instruct-Q8_0-GGUF --hf-file deepphi-3.5-mini-instruct-q8_0.gguf -c 2048 ``` Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. Step 1: Clone llama.cpp from GitHub. ``` git clone https://github.com/ggerganov/llama.cpp ``` Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). ``` cd llama.cpp && LLAMA_CURL=1 make ``` Step 3: Run inference through the main binary. ``` ./llama-cli --hf-repo Triangle104/DeepPhi-3.5-mini-instruct-Q8_0-GGUF --hf-file deepphi-3.5-mini-instruct-q8_0.gguf -p "The meaning to life and the universe is" ``` or ``` ./llama-server --hf-repo Triangle104/DeepPhi-3.5-mini-instruct-Q8_0-GGUF --hf-file deepphi-3.5-mini-instruct-q8_0.gguf -c 2048 ```