RoMa v2: Harder Better Faster Denser Feature Matching
Abstract
A novel dense feature matching model using a custom architecture and loss, combined with DINOv3, achieves state-of-the-art accuracy and efficiency.
Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SegMASt3R: Geometry Grounded Segment Matching (2025)
- GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization (2025)
- LightGlueStick: a Fast and Robust Glue for Joint Point-Line Matching (2025)
- PointSt3R: Point Tracking through 3D Grounded Correspondence (2025)
- WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting (2025)
- SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference (2025)
- OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper