Very nice build, but can you also include the Linux Vulkan.so in the build?
I mean, metal.so is only targeted to x86 Mac OS+ Radeon GPU users. Usability is low. If you are an Apple OSX user, the accelerate framework used by llama.cpp and Koboldcpp is a better code path now.
Windows Vulkan.so is less import, since Koboldcpp takes care of combined CPU+GPU inference using GGML 4bit even in a single .exe file.
Linux Vulkan.so is important. For AMD iGPU users (7840U , where AMD's official website won't even offer windows 11 drivers, let along ROCm OpenCL drivers useful for Linux Koboldcpp inference) So MLC-LLM Vulkan path is the only way for 7840U(nice 8TF iGPU)
I am not opposed to building Vulkan.so myself.
Do you have scripts or commands that enabled you to build this ? MLC-LLM documentation isn't that great. Want to build just the vulkan.so file, without requantize and reshard your model.
Hey @whatever1983 I'm happy to help here, but let me provide you the command.
python3 build.py --hf-path bigcode/gpt_bigcode-santacoder --target vulkan --quantization q4f16_0
I followed this and guide: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html