-
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Quantized Visual Geometry Grounded Transformer
Paper • 2509.21302 • Published • 8 -
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 6
Ramanana Rahary
AdrienRR
·
AI & ML interests
None yet
Recent Activity
updated
a collection
12 days ago
Vision
updated
a collection
15 days ago
Vision
updated
a collection
15 days ago
Vision