22 77 14

Min-Hung Chen

cmhungsteve

https://minhungchen.netlify.app/

AI & ML interests

Multimodal AI, Transfer Learning, Unsupervised Learning, Video Understanding, Vision Transformer, Computer Vision, Deep Learning

Recent Activity

liked a dataset 13 days ago

nvidia/R4D-Bench

liked a dataset 14 days ago

chanhee-luke/RoboSpatial-Home

upvoted a paper 14 days ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

View all activity

Organizations

authored 3 papers 3 months ago

3AM: Segment Anything with Geometric Consistency in Videos

Paper • 2601.08831 • Published Jan 13 • 34

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Paper • 2601.09708 • Published Jan 14 • 54

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230

authored a paper 4 months ago

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Paper • 2512.17012 • Published Dec 18, 2025 • 48

submitted a paper to Daily Papers 4 months ago

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Paper • 2512.17012 • Published Dec 18, 2025 • 48

authored 2 papers 4 months ago

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Paper • 2512.14273 • Published Dec 16, 2025 • 10

BlurDM: A Blur Diffusion Model for Image Deblurring

Paper • 2512.03979 • Published Dec 3, 2025 • 5

authored a paper 5 months ago

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

Paper • 2511.07299 • Published Nov 10, 2025 • 9

authored 4 papers 6 months ago

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 18

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

Paper • 2510.09561 • Published Oct 10, 2025 • 9

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Paper • 2510.07319 • Published Oct 8, 2025 • 3

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

Paper • 2510.03232 • Published Oct 3, 2025 • 1

authored a paper 7 months ago

V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts

Paper • 2509.18053 • Published Sep 22, 2025 • 4

authored 7 papers 8 months ago

CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models

Paper • 2501.02355 • Published Jan 4, 2025 • 1

ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection

Paper • 2412.13174 • Published Dec 17, 2024 • 1

Spatio-Temporal Context Prompting for Zero-Shot Action Detection

Paper • 2408.15996 • Published Aug 28, 2024 • 1

GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

Paper • 2406.12834 • Published Jun 18, 2024 • 1

Min-Hung Chen

AI & ML interests

Recent Activity

Organizations

cmhungsteve's activity