Any plans for a VL version?

#19
by kentrolla - opened

I'm using a LoRA tuned Qwen3 at the moment and it is by far the absolute best model for my use case. However, image processing and understanding is a huge deal, so I'm wondering if any VL model/version is planned?

I am basically building system that is utilizing a modified Qwen 3, that uses it's internal reasoning paired with a self-managed quad memory system along with a plethora of tools that the model decides on it's own to use, as well as self logging and assessment. The model is not just doing a task because it was assigned to it, it understands why it's doing something and has a hierarchical structure to achieving the goal, in addition to being able to proactively message me for questions or additional input. Learning post-task is also done through a lesson system based on how the performance of the task was overall, and the AI adapts and learns based off of this to continually grow.

Currently I'm at the point where I've had to rely on external image to text, but it's intermittent. My AI has been able to complete complex tasks, such as using the GUI to access my PC, control the mouse and decide on it's own on what to click, to search, then analyze the data (on a Chrome browser for example). Everything is working and has been tested multiple times in different ways.

What would really skyrocket my project is a true image processor.

Sign up or log in to comment