Adaptive Riemannian optimization for multi-scale diffeomorphic matching (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-026-72508-3.pdf

Adaptive Riemannian optimization for multi-scale diffeomorphic matching

Article https://doi.org/10.1038/s41467-026-72508-3 Adaptive Riemannian optimization for multiscale diffeomorphic matching Received: 5 April 2025 Rohit Jena 1,2 , Pratik Chaudhari 1,3 & James C. Gee 1,2,4 Accepted: 15 April 2026 1234567890():,; 1234567890():,; Check for updates Image matching is a fundamental task in quantitative biomedical and biological image analyses, enabling researchers to compare, integrate, and interpret imaging data across subjects, time points, modalities, and experimental conditions. Existing state-of-the-art registration methods are slow due to inefﬁcient implementations and poor convergence rates because of the illconditioned nature of the optimization problem. Deep learning methods offer fast inference but require extensive training time, substantial inference memory, and fail to generalize across long-tailed distributions or diverse image modalities, necessitating costly retraining. We address these challenges by proposing FireANTs, a training-free, GPU-accelerated, multi-scale adaptive Riemannian optimization algorithm for fast and accurate dense diffeomorphic image matching. FireANTs more than doubles the speed of the community standard ANTs registration tool on a CPU, and is two orders of magnitude faster on a GPU. On the GPU, FireANTs performs competitively with deep learning methods on inference runtime while consuming up to 10 × less memory. FireANTs demonstrates robustness on a wide variety of matching problems across modalities, species, and organs, without any domain-speciﬁc training or tuning. Our framework allows hyperparameter grid search studies with less resources and time compared to traditional and deep learning registration algorithms alike. The ability to identify and map corresponding elements across diverse datasets or perceptual inputs—known as correspondence matching—is fundamental to interpreting and interacting with the world. Correspondence matching between images is one of the longstanding fundamental problems in computer vision. Inﬂuential computer vision researcher Takeo Kanade famously once said that the three fundamental problems of computer vision are: “Correspondence, correspondence, correspondence”1. Indeed, correspondence matching is fundamental and ubiquitous across various disciplines, manifesting in many forms including but not limited to stereo matching2, structure from motion3,4, template matching5, motion tracking6,7, shape correspondence8, semantic correspondence9, point cloud matching10, optical ﬂow11, and deformable image matching12. Solving these problems addresses the desiderata for a wide range of applications in computer vision, robotics, medical imaging, remote sensing, photogrammetry, geological and ecological sciences, cognitive sciences, human-computer interaction, and self-driving, among many other ﬁelds. Correspondence matching is broadly divided into two categories: sparse and dense matching. Most sparse matching problems, like stereo matching, structure from motion, and template matching, involve ﬁnding a sparse set of salient features across images followed by matching them. In such cases, the transformation between images, surfaces, or point clouds is typically also parameterized with a small number of parameters, e.g., an afﬁne transform, homography, or a fundamental matrix. These methods are often robust to noise, occlusions, and salient features can be detected and matched efﬁciently via 1 Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA. 2Penn Image Computing and Science Laboratory, University of Pennsylvania, Philadelphia, PA, USA. 3Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA. 4Radiology, Perelman School of e-mail: ; Medicine, University of Pennsylvania, Philadelphia, PA, USA. Nature Communications | (2026)17:4774 1 Article analytical closed forms. In contrast, dense matching is much harder because the entire image is considered for matching and cannot be reduced to a sparse set of salient features, and the transformation between images is typically parameterized with a large number of parameters, e.g., a dense deformation ﬁeld. Moreover, dense matching is sensitive to local noise, and cannot be solved efﬁciently via analytical closed forms—necessitating iterative optimization methods13–18. Due to the dense and high-dimensional nature, these methods are often plagued with ill-posedness12,19,20, difﬁculty in optimization, inefﬁcient implementations, and lack of scalability to high-resolution data. In this work, we focus on dense deformable correspondence matching, which is the non-linear and local (hence deformable) alignment of two or more images into a common coordinate system. Dense deformable correspondence matching is a fundamental problem in computer vision21, medical imaging22–24, microscopy25,26, and remote sensing. Here, we focus on applications in biomedical and biological imaging. In the biomedical and biological sciences, deformable correspondence matching is also referred to as deformable registration. Within dense deformations, diffeomorphisms are of special interest as a family of deformations that are invertible transformations such that both the transform and its inverse are differentiable. This allows us to accurately model the correspondence between images while ensuring that the topological structure of the anatomy is preserved, i.e., no tearing or folding of the anatomy is introduced. We address and tackle two fundamental problems in dense correspondence matching: ill-conditioning and scalability. The illconditioning arises due to the high-dimensional and heterogeneous nature of the dense matching optimization objective, that can be mitigated by adaptive optimization methods. Although standard adaptive optimization methods27,28 are shown to work in ﬁxed Euclidean spaces, it is not obvious how to extend this formulation to the non-Euclidean space of diffeomorphisms. Fortunately, diffeomorphisms admit many interesting mathematical properties like being embedded in a Riemannian manifold, having a Lie Group structure, and local geodesic formulations that can be exploited for adaptive optimization. We present a mathematically rigorous framework for adaptive optimization of diffeomorphic matching Section “Exploiting the group structure of diffeomorphisms”. This is done by exploiting the group structure of diffeomorphisms to deﬁne a custom gradient descent algorithm, followed by adaptive optimization on this space. Second, we observe that most existing state-ofthe-art methods are prohibitively slow for high-resolution data, which limits their applicability to rigorous hyperparameter studies, large-scale data, or high-resolution alignment at mesoscopic or microscopic resolutions. Our meticulously implemented operational contributions lead to an algorithm that is around 2 − 7 × faster than state-of-the-art optimization toolkits on CPU, and up to three orders of magnitud (...truncated)