321 367 613

Yatharth Sharma

YaTharThShaRma999

AI & ML interests

None yet

Recent Activity

liked a model 9 days ago

BAAI/MTVCraft

reacted to chintankp's post with 👍 13 days ago

We are incredibly proud to be among the top contributors to the AI community with our leading open models and permissive licensed datasets for reasoning, Physical AI, speech, vision, and more. See huggingface.co/spaces/cfahlgren1/org-activity-heatmap Download the latest models and datasets from huggingface.co/nvidia

reacted to a-r-r-o-w's post with 🚀 19 days ago

As you might have already heard, FLUX.1-Kontext-dev is now released and taken the generative community by storm! In case you haven't come across it, you can get started with Kontext using 🤗 diffusers. See the official [model](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) and [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#flux). Want to know how inference companies like Fal & Replicate are able to run the model so fast and in under 2 seconds per image? Check out this [gist](https://gist.github.com/a-r-r-o-w/d08c37e8bd3e9c26b4ce80360be148c6) for some details!

View all activity

Organizations

None yet

upvoted a paper 19 days ago

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

Paper • 2506.20936 • Published 21 days ago • 11

upvoted an article 25 days ago

Article

Introducing Cosmos Predict-2: A Foundation For Your Own World Model

and 2 others •

about 1 month ago

• 8

upvoted a paper about 1 month ago

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Paper • 2506.02863 • Published Jun 3 • 8

upvoted 2 papers about 2 months ago

Text Generation Beyond Discrete Token Sampling

Paper • 2505.14827 • Published May 20 • 10

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 216

upvoted 4 papers 2 months ago

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 95

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 84

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Paper • 2505.02471 • Published May 5 • 12

upvoted a paper 3 months ago

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Paper • 2504.20690 • Published Apr 29 • 20

upvoted a collection 3 months ago

HiDream-E1

Collection

A collections of HiDream-E1 models. • 2 items • Updated Apr 28 • 4

upvoted 3 papers 3 months ago

Kimi-Audio Technical Report

Paper • 2504.18425 • Published Apr 25 • 19

Compass Control: Multi Object Orientation Control for Text-to-Image Generation

Paper • 2504.06752 • Published Apr 9 • 10

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Paper • 2504.07960 • Published Apr 10 • 49

upvoted a collection 3 months ago

GLM-4-0414

Collection

GLM-4-0414 series model • 8 items • Updated 16 days ago • 129

upvoted 4 papers 4 months ago

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Paper • 2503.01183 • Published Mar 3 • 28

upvoted a paper 5 months ago

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Paper • 2502.20583 • Published Feb 27 • 13

Yatharth Sharma

AI & ML interests

Recent Activity

Organizations

YaTharThShaRma999's activity

Introducing Cosmos Predict-2: A Foundation For Your Own World Model