Boyuan Sun's picture

17 2

Boyuan Sun

BoyuanSun

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Depth Anything at Any Condition

liked a model 14 days ago

BBBBCHAN/LLaVA-Scissor-baseline-0.5B

liked a model 14 days ago

BBBBCHAN/LLaVA-Scissor-baseline-7B

View all activity

Organizations

None yet

upvoted a paper 11 days ago

Depth Anything at Any Condition

Paper • 2507.01634 • Published 12 days ago • 46

liked 2 models 14 days ago

BBBBCHAN/LLaVA-Scissor-baseline-0.5B

Video-Text-to-Text • 0.9B • Updated 13 days ago • 58 • 4

BBBBCHAN/LLaVA-Scissor-baseline-7B

Video-Text-to-Text • 8B • Updated 13 days ago • 41 • 3

upvoted a paper 14 days ago

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Paper • 2506.21862 • Published 17 days ago • 35

upvoted 3 papers about 2 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10 • 32

Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 82

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 215

upvoted a paper 3 months ago

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Paper • 2504.07960 • Published Apr 10 • 49

upvoted a paper 6 months ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6 • 37

upvoted 10 papers 7 months ago

DepthLab: From Partial to Complete

Paper • 2412.18153 • Published Dec 24, 2024 • 37

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 369

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 146

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 99

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Paper • 2306.04300 • Published Jun 7, 2023 • 2

VideoLLM-online: Online Video Large Language Model for Streaming Video

Paper • 2406.11816 • Published Jun 17, 2024 • 25

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 104

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 160

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 117

MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Paper • 2410.18977 • Published Oct 24, 2024 • 15