view post Post 2495 so many multimodal releases these days π€ > ERNIE-4.5-VL: new vision language MoE models by Baidu https://huggingface.co/models?search=ernie-4.5-vl> new visual document retrievers by NVIDIA (sota on ViDoRe!) nvidia/llama-nemoretriever-colembed-3b-v1 nvidia/llama-nemoretriever-colembed-1b-v1> Ovis-3b: new image-text in image-text out models by Alibaba β€΅οΈ https://huggingface.co/spaces/AIDC-AI/Ovis-U1- See translation π 6 6 + Reply
Ovis-U1 Collection An unified model for multimodal understanding, text-to-image generation, and image editing. β’ 3 items β’ Updated 8 days ago β’ 3
Ovis-U1 Collection An unified model for multimodal understanding, text-to-image generation, and image editing. β’ 3 items β’ Updated 8 days ago β’ 3