Adaptive Riemannian optimization for multi-scale diffeomorphic matching
Article
https://doi.org/10.1038/s41467-026-72508-3
Adaptive Riemannian optimization for multiscale diffeomorphic matching
Received: 5 April 2025
Rohit Jena
1,2
, Pratik Chaudhari
1,3
& James C. Gee
1,2,4
Accepted: 15 April 2026
1234567890():,;
1234567890():,;
Check for updates
Image matching is a fundamental task in quantitative biomedical and biological image analyses, enabling researchers to compare, integrate, and interpret
imaging data across subjects, time points, modalities, and experimental conditions. Existing state-of-the-art registration methods are slow due to inefficient implementations and poor convergence rates because of the illconditioned nature of the optimization problem. Deep learning methods offer
fast inference but require extensive training time, substantial inference
memory, and fail to generalize across long-tailed distributions or diverse
image modalities, necessitating costly retraining. We address these challenges
by proposing FireANTs, a training-free, GPU-accelerated, multi-scale adaptive
Riemannian optimization algorithm for fast and accurate dense diffeomorphic
image matching. FireANTs more than doubles the speed of the community
standard ANTs registration tool on a CPU, and is two orders of magnitude
faster on a GPU. On the GPU, FireANTs performs competitively with deep
learning methods on inference runtime while consuming up to 10 × less
memory. FireANTs demonstrates robustness on a wide variety of matching
problems across modalities, species, and organs, without any domain-specific
training or tuning. Our framework allows hyperparameter grid search studies
with less resources and time compared to traditional and deep learning
registration algorithms alike.
The ability to identify and map corresponding elements across diverse
datasets or perceptual inputs—known as correspondence matching—is
fundamental to interpreting and interacting with the world. Correspondence matching between images is one of the longstanding fundamental problems in computer vision. Influential computer vision
researcher Takeo Kanade famously once said that the three fundamental
problems of computer vision are: “Correspondence, correspondence,
correspondence”1. Indeed, correspondence matching is fundamental
and ubiquitous across various disciplines, manifesting in many forms
including but not limited to stereo matching2, structure from motion3,4,
template matching5, motion tracking6,7, shape correspondence8,
semantic correspondence9, point cloud matching10, optical flow11, and
deformable image matching12. Solving these problems addresses the
desiderata for a wide range of applications in computer vision, robotics,
medical imaging, remote sensing, photogrammetry, geological and
ecological sciences, cognitive sciences, human-computer interaction,
and self-driving, among many other fields.
Correspondence matching is broadly divided into two categories:
sparse and dense matching. Most sparse matching problems, like
stereo matching, structure from motion, and template matching,
involve finding a sparse set of salient features across images followed
by matching them. In such cases, the transformation between images,
surfaces, or point clouds is typically also parameterized with a small
number of parameters, e.g., an affine transform, homography, or a
fundamental matrix. These methods are often robust to noise, occlusions, and salient features can be detected and matched efficiently via
1
Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA. 2Penn Image Computing and Science Laboratory, University of
Pennsylvania, Philadelphia, PA, USA. 3Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA. 4Radiology, Perelman School of
e-mail: ;
Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Nature Communications | (2026)17:4774
1
Article
analytical closed forms. In contrast, dense matching is much harder
because the entire image is considered for matching and cannot be
reduced to a sparse set of salient features, and the transformation
between images is typically parameterized with a large number of
parameters, e.g., a dense deformation field. Moreover, dense matching
is sensitive to local noise, and cannot be solved efficiently via analytical
closed forms—necessitating iterative optimization methods13–18. Due to
the dense and high-dimensional nature, these methods are often plagued with ill-posedness12,19,20, difficulty in optimization, inefficient
implementations, and lack of scalability to high-resolution data.
In this work, we focus on dense deformable correspondence
matching, which is the non-linear and local (hence deformable)
alignment of two or more images into a common coordinate system.
Dense deformable correspondence matching is a fundamental problem in computer vision21, medical imaging22–24, microscopy25,26, and
remote sensing. Here, we focus on applications in biomedical and
biological imaging. In the biomedical and biological sciences,
deformable correspondence matching is also referred to as
deformable registration. Within dense deformations, diffeomorphisms are of special interest as a family of deformations that are
invertible transformations such that both the transform and its
inverse are differentiable. This allows us to accurately model the
correspondence between images while ensuring that the topological
structure of the anatomy is preserved, i.e., no tearing or folding of
the anatomy is introduced.
We address and tackle two fundamental problems in dense
correspondence matching: ill-conditioning and scalability. The illconditioning arises due to the high-dimensional and heterogeneous
nature of the dense matching optimization objective, that can be
mitigated by adaptive optimization methods. Although standard
adaptive optimization methods27,28 are shown to work in fixed
Euclidean spaces, it is not obvious how to extend this formulation to
the non-Euclidean space of diffeomorphisms. Fortunately, diffeomorphisms admit many interesting mathematical properties like
being embedded in a Riemannian manifold, having a Lie Group
structure, and local geodesic formulations that can be exploited for
adaptive optimization. We present a mathematically rigorous framework for adaptive optimization of diffeomorphic matching Section “Exploiting the group structure of diffeomorphisms”. This is
done by exploiting the group structure of diffeomorphisms to define
a custom gradient descent algorithm, followed by adaptive optimization on this space. Second, we observe that most existing state-ofthe-art methods are prohibitively slow for high-resolution data,
which limits their applicability to rigorous hyperparameter studies,
large-scale data, or high-resolution alignment at mesoscopic or
microscopic resolutions.
Our meticulously implemented operational contributions lead to
an algorithm that is around 2 − 7 × faster than state-of-the-art optimization toolkits on CPU, and up to three orders of magnitud (...truncated)