17 393 1279

Reza Sayar

Reza2kn

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

liked a model 1 day ago

FreedomIntelligence/Janus-4o-7B

upvoted a paper 1 day ago

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

View all activity

Organizations

upvoted 2 papers 1 day ago

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Paper • 2506.18088 • Published 5 days ago • 16

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published 5 days ago • 59

upvoted an article 1 day ago

Article

Gemma 3n fully available in the open-source ecosystem!

and 7 others •

2 days ago

• 70

upvoted 5 papers 2 days ago

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Paper • 2506.18841 • Published 4 days ago • 48

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Paper • 2506.19851 • Published 3 days ago • 50

upvoted a collection 16 days ago

V-JEPA 2

Collection

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated 14 days ago • 128

upvoted an article 16 days ago

Article

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

and 4 others •

16 days ago

• 63

upvoted a collection 17 days ago

Open Whisper-style Speech Models (OWSM)

Collection

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated 25 days ago • 6

upvoted an article 17 days ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

and 8 others •

25 days ago

• 167

upvoted 4 papers 17 days ago

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published 20 days ago • 105

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published 19 days ago • 38

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published 19 days ago • 18

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published 18 days ago • 18

upvoted a paper 24 days ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 25 days ago • 103

upvoted 2 papers 25 days ago

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Paper • 2505.23009 • Published 30 days ago • 17

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published 28 days ago • 9

upvoted an article 26 days ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

May 21

• 173

Reza Sayar

AI & ML interests

Recent Activity

Organizations

Reza2kn's activity

Gemma 3n fully available in the open-source ecosystem!

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch