9 29 50

Sukesh Perla

hitchhiker3010

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

Gapeleon/bytedance_BAGEL-7B-MoT-INT8

updated a collection 2 days ago

AI Ads

liked a model 7 days ago

tencent/HunyuanVideo-Avatar

View all activity

Organizations

hitchhiker3010's activity

liked a model 2 days ago

Gapeleon/bytedance_BAGEL-7B-MoT-INT8

Any-to-Any • Updated 12 days ago • 397 • 23

updated a collection 2 days ago

AI Ads

Collection

4 items • Updated 2 days ago

liked a model 7 days ago

tencent/HunyuanVideo-Avatar

Image-to-Video • Updated 9 days ago • 158

liked a Space 7 days ago

10.3k

AI Comic Factory

👩

Create your own AI comic with a single prompt

upvoted a collection 7 days ago

X2I Dataset

Collection

Datasets used in OmniGen-v1. (v2 is coming soon :) ) • 5 items • Updated Apr 28 • 18

updated a collection 7 days ago

AI Ads

Collection

4 items • Updated 2 days ago

liked a Space 9 days ago

495

Wan2.1 Fast

🎥

Turn static images into animated videos

liked a model 17 days ago

ByteDance-Seed/BAGEL-7B-MoT

Any-to-Any • Updated 15 days ago • 9.94k • 979

reacted to merve's post with 🔥 17 days ago

Post

2575

It was the week of video generation at @huggingface , on top of many new LLMs, VLMs and more!
Let’s have a wrap 🌯 merve/may-16-releases-682aeed23b97eb0fe965345c

LLMs 💬
> Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS)
> II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet
> TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)

Multimodal 🖼️💬
> Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output 💬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B
> They also released pre-training and fine-tuning datasets
> MMMG is a multimodal generation benchmark for image, audio, text (interleaved)

Image Generation ⏯️
> Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS)
> ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS)
> multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux
> LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks
> Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators

Audio 🗣️
> stabilityai released stable-audio-open-small new text-to-audio model
> TEN-framework released ten-vad, voice activity detection model (OS)

reacted to merve's post with 🔥 17 days ago

Post

1713

NVIDIA released new vision reasoning model for robotics: Cosmos-Reason1-7B 🤖 nvidia/cosmos-reason1-67c9e926206426008f1da1b7

> first reasoning model for robotics
> based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM 🤗
> comes with SFT & alignment datasets and a new benchmark 👏