LLMs π¬ > Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS) > II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet > TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)
Multimodal πΌοΈπ¬ > Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output π¬itβs based on an image encoder, a text decoder and a DiT, and comes in 8B > They also released pre-training and fine-tuning datasets > MMMG is a multimodal generation benchmark for image, audio, text (interleaved)
Image Generation β―οΈ > Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS) > ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS) > multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux > LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks > Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators
Audio π£οΈ > stabilityai released stable-audio-open-small new text-to-audio model > TEN-framework released ten-vad, voice activity detection model (OS)
> first reasoning model for robotics > based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM π€ > comes with SFT & alignment datasets and a new benchmark π