--- license: mit --- # **Phi-3.5-moe-mlx-int4** Note: This is unoffical version,just for test and dev. This is a quantized INT4 model based on Apple MLX Framework Phi-3.5-MoE-Instruct. You can deploy it on Apple Silicon devices (M1,M2,M3). Installation ```bash pip install -U mlx-lm ``` Conversion ```bash python -m mlx_lm.convert --hf-path microsoft/Phi-3.5-MoE-instruct -q ``` Samples ```python from mlx_lm import load, generate model, tokenizer = load("./phi-3.5-moe-mlx-int4") sys_msg = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you: - Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content - Translate: This is a tool that helps you translate into any language, using plain language as required To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "Build Muliti Agents with MOE models" you must use the calculator tool like so: { "tool_name": "Blog", "input": "Build Muliti Agents with MOE models" } Or to translate the question "can you introduce yourself in Chinese" you must respond: { "tool_name": "Search", "input": "can you introduce yourself in Chinese" } Remember just output the final result, ouput in JSON format containing `"agentid"`,`"tool_name"` , `"input"` and `"output"` key-value pairs .: [ { "agentid": "step1", "tool_name": "Blog", "input": "Build Muliti Agents with MOE models", "output": "........." }, { "agentid": "step2", "tool_name": "Search", "input": "can you introduce yourself in Chinese", "output": "........." }, { "agentid": "final" "tool_name": "Result", "output": "........." } ] The users answer is as follows. """ query ='Write something about Generative AI with MOE , translate it to Chinese' prompt = tokenizer.apply_chat_template( [{"role": "system", "content": sys_msg},{"role": "user", "content": query}], tokenize=False, add_generation_prompt=True, ) response = generate(model, tokenizer, prompt=prompt,max_tokens=1024, verbose=True) ```