ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published 11 days ago • 17
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Paper • 2504.20690 • Published Apr 29 • 19
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published 23 days ago • 31
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Apr 12 • 65
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 22
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 112
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26, 2024 • 39
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper • 2408.11706 • Published Aug 21, 2024 • 7
Q-Ground: Image Quality Grounding with Large Multi-modality Models Paper • 2407.17035 • Published Jul 24, 2024 • 1
Magpie-Qwen2 Datasets Collection Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated Jan 13 • 10
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5 • 227
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 57
InternVL2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated Apr 20 • 89
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper • 2406.09356 • Published Jun 13, 2024 • 5