new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

May 15

Submitted by

jiuhai

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

·
13 authors

3

Submitted by

xiaomoguhzz

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

·
6 authors

3

Submitted by

nielsr

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

·
15 authors

Submitted by

scikkk

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

·
11 authors

Submitted by

toshas

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis

·
8 authors

2

Submitted by

HanjungKim

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

·
6 authors

2

Submitted by

NadMag

LightLab: Controlling Light Sources in Images with Diffusion Models

·
7 authors

3

Submitted by

tarsur909

SweRank: Software Issue Localization with Code Ranking

·
10 authors

Submitted by

akhaliq

CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

·
9 authors

Submitted by

novateur

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

·
14 authors

3

Submitted by

h9LtLSb

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

·
7 authors

2

Submitted by

pritamqu

VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

·
2 authors

2

Submitted by

peihaowang

Steepest Descent Density Control for Compact 3D Gaussian Splatting

·
11 authors

Submitted by

kailassrt

DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition

·
11 authors

2

Submitted by

kkr5155

Behind Maya: Building a Multilingual Vision Language Model

·
19 authors

2

Submitted by

JadeCheng

Visually Interpretable Subtask Reasoning for Visual Question Answering

·
3 authors

2

Submitted by

kkr5155

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

·
4 authors

2