Qwen/Qwen2.5-Omni-7B · Where is the MLX version of qwen 2.5 Omni?

Mar 29

We all know we need a voice to voice model and video to voice model as well. Qwen 2.5 Omni has been out for a while now. Please when are we getting the MLX version of the qwen 2.5 Omni?

krustik

Mar 29

In MLX only text will be usable maybe and its very niche format even to GGUF because of "Apple tax" barrier. It's very complex product, which require huge investments of time to make it work. MLX is kinda a piano only, multimodal is whole orchestra which need new room to play.

stylon

May 12

•

edited May 12

In MLX only text will be usable maybe and its very niche format even to GGUF because of "Apple tax" barrier. It's very complex product, which require huge investments of time to make it work. MLX is kinda a piano only, multimodal is whole orchestra which need new room to play.

Looking at https://github.com/ml-explore/mlx-examples I've doubts about that assessment. If interested at all, you may want to check out cifar, mnist for vision or whisper for audio. The same applies to the comment about GGUF: https://github.com/ml-explore/mlx-examples/tree/main/llms/gguf_llm.

What is true is the fact that (currently) mlx indeed lacks support for the model type qwen2.5_omni, but not for qwen2.5 in general.