7 31 131

Jaeyoon Jung

lastdefiance20

AI & ML interests

multimodal

Recent Activity

upvoted a paper 13 days ago

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

liked a model about 1 month ago

openbmb/MiniCPM4-8B

liked a model about 1 month ago

mistralai/Magistral-Small-2506

View all activity

Organizations

upvoted a paper 13 days ago

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Paper • 2507.07990 • Published 14 days ago • 44

upvoted 2 papers about 2 months ago

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

Paper • 2505.18842 • Published May 24 • 37

Let's Predict Sentence by Sentence

Paper • 2505.22202 • Published May 28 • 18

upvoted a paper 2 months ago

Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 57

upvoted 2 papers 3 months ago

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published Apr 29 • 21

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21 • 75

upvoted a collection 3 months ago

Qwen3

Collection

74 items • Updated 3 days ago • 907

upvoted 5 papers 4 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 193

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 58

KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language

Paper • 2503.23730 • Published Mar 31 • 4

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 53

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20 • 41

upvoted 4 papers 5 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 146

upvoted 2 papers 6 months ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 88

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 53

upvoted 2 papers 7 months ago

LearnLM: Improving Gemini for Learning

Paper • 2412.16429 • Published Dec 21, 2024 • 22

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 59

Jaeyoon Jung

AI & ML interests

Recent Activity

Organizations

lastdefiance20's activity