Any plan to release a Vision enabled version with the same or near the same base and instruct model?

by drmcbride - opened 7 days ago

7 days ago

As you can tell we all really enjoy the model and the way it sounds because it doesn't sound as robotic as other SOTA. It would be a very welcome addition to enable full vision capability later on for the model in the form of a update/upgrade to allow it to be a better replacement for some use cases. especially if you follow the same design philosophy that you did to create this amazing model.

lsw825

Moonshot AI org 7 days ago

Thanks. We do have the plan to make Kimi K2 having vision ability. We already have the relevant technical expertise (see kimi-vl in our homepage), but it will still take some time.

drmcbride

7 days ago

Will it still act like this model or will it retrain it entirely and make it act different?

lsw825

Moonshot AI org 7 days ago

You can refer to our kimi-vl report https://arxiv.org/abs/2504.07491. The act should be similar but cannot be exact the same.

teowu

Moonshot AI org 7 days ago

Thank you for your interest in the vision-enabled version. We will ship the vision-enabled version when it meets our expectations on such a scale. And we hope when it is here, it shall not let you down.

As for a prototype on tiny vision-language models, please refer to Kimi-VL-A3B: https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506.

andrewmonostate

6 days ago

Hey @teowu @lsw825 is it feasible to finetune the moonshotai/Kimi-K2-Instruct or Base for vision in your assessment? Or should the prototype VL model mentioned above be sufficient for general video tasks as a SOTA open weights VL model as of today? I'm particuarly interested in the agentic capabilities from K2, so am not sure if Kimi-VL-A3B-Thinking-2506 will perform similarly and respond to agentic tool usage well, but I have not yet tested it against tasks.

Guokun

Moonshot AI org 5 days ago

No, we didn't optimize Kimi-VL-A3B-Thinking-2506 for agentic tool usage

adrienlollo

3 days ago

are you planning to open source kimi's chat platform? with context management, ocr etc? or not at all.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment