Sukesh Perla

hitchhiker3010

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago
Gapeleon/bytedance_BAGEL-7B-MoT-INT8
updated a collection 2 days ago
AI Ads
liked a model 7 days ago
tencent/HunyuanVideo-Avatar
View all activity

Organizations

Spaces-explorers's profile picture Hugging Face Discord Community's profile picture open/ acc's profile picture

hitchhiker3010's activity

reacted to merve's post with πŸ”₯ 17 days ago
view post
Post
2575
It was the week of video generation at @huggingface , on top of many new LLMs, VLMs and more!
Let’s have a wrap 🌯 merve/may-16-releases-682aeed23b97eb0fe965345c

LLMs πŸ’¬
> Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS)
> II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet
> TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)

Multimodal πŸ–ΌοΈπŸ’¬
> Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output πŸ’¬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B
> They also released pre-training and fine-tuning datasets
> MMMG is a multimodal generation benchmark for image, audio, text (interleaved)

Image Generation ⏯️
> Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS)
> ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS)
> multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux
> LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks
> Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators

Audio πŸ—£οΈ
> stabilityai released stable-audio-open-small new text-to-audio model
> TEN-framework released ten-vad, voice activity detection model (OS)

reacted to merve's post with πŸ”₯ 17 days ago
view post
Post
1713
NVIDIA released new vision reasoning model for robotics: Cosmos-Reason1-7B πŸ€– nvidia/cosmos-reason1-67c9e926206426008f1da1b7

> first reasoning model for robotics
> based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM πŸ€—
> comes with SFT & alignment datasets and a new benchmark πŸ‘