32 39 18

Jiaqi Wang PRO

myownskyW7

myownskyW7

AI & ML interests

None yet

Recent Activity

upvoted a paper about 18 hours ago

Video World Models with Long-term Spatial Memory

commented on a paper about 18 hours ago

Video World Models with Long-term Spatial Memory

authored a paper 17 days ago

MMDetection: Open MMLab Detection Toolbox and Benchmark

View all activity

Organizations

myownskyW7's activity

upvoted a paper about 18 hours ago

Video World Models with Long-term Spatial Memory

Paper • 2506.05284 • Published 1 day ago • 30

upvoted a paper 17 days ago

Visual Agentic Reinforcement Fine-Tuning

Paper • 2505.14246 • Published 18 days ago • 31

upvoted a paper about 1 month ago

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6 • 92

upvoted a paper about 2 months ago

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Paper • 2504.06232 • Published Apr 8 • 14

upvoted 3 papers 3 months ago

upvoted 4 papers 4 months ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published Feb 18 • 42

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12 • 44

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 65

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 30

upvoted a paper 5 months ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21 • 46

upvoted a collection 5 months ago

InternLM-XComposer2.5

Collection

6 items • Updated Feb 11 • 12

upvoted 3 papers 5 months ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 44

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published Jan 6 • 45

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6 • 37

upvoted 3 papers 6 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 99

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Paper • 2412.01824 • Published Dec 2, 2024 • 66

upvoted a paper 8 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 37