matthoffner/WizardCoder-mlc-llm · Very nice build, but can you also include the Linux Vulkan.so in the build?

whatever1983

Jul 10, 2023

I mean, metal.so is only targeted to x86 Mac OS+ Radeon GPU users. Usability is low. If you are an Apple OSX user, the accelerate framework used by llama.cpp and Koboldcpp is a better code path now.

Windows Vulkan.so is less import, since Koboldcpp takes care of combined CPU+GPU inference using GGML 4bit even in a single .exe file.

Linux Vulkan.so is important. For AMD iGPU users (7840U , where AMD's official website won't even offer windows 11 drivers, let along ROCm OpenCL drivers useful for Linux Koboldcpp inference) So MLC-LLM Vulkan path is the only way for 7840U(nice 8TF iGPU)

whatever1983

Jul 10, 2023

I am not opposed to building Vulkan.so myself.

Do you have scripts or commands that enabled you to build this ? MLC-LLM documentation isn't that great. Want to build just the vulkan.so file, without requantize and reshard your model.

matthoffner

Owner Jul 10, 2023

Hey @whatever1983 I'm happy to help here, but let me provide you the command.

python3 build.py --hf-path bigcode/gpt_bigcode-santacoder --target vulkan --quantization q4f16_0

I followed this and guide: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html