PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun1, Ana Serrano1, Belen Masia1, Matheus Gadelha2, Yannick Hold-Geoffroy2, Xin Sun2, Diego Gutierrez1

1Universidad de Zaragoza - I3A, 2 Adobe Research

πŸ“… IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

Abstract

Images as an artistic medium often rely on specific camera angles and lens distortions to convey ideas or emotions; however, such precise control is missing in current text-to-image models. We propose an efficient and general solution that allows precise control over the camera when generating both photographic and artistic images. Unlike prior methods that rely on predefined shots, we rely solely on four simple extrinsic and intrinsic camera parameters, removing the need for pre-existing geometry, reference 3D objects, and multi-view data. We also present a novel dataset with more than 57,000 images, along with their text prompts and ground-truth camera parameters. Our evaluation shows precise camera control in text-to-image generation, surpassing traditional prompt engineering approaches.

πŸ”— πŸ“„ Paper on arXiv | 🌐 Project Page


πŸ“¦ Model Access

The model is available on Hugging Face: edurnebb/PreciseCam

NOTE: We offer a public model that differs from the one used in the paper. While results may vary, the overall behavior remains consistent.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support