Group-Invariant Max Filtering
Foundations of Computational Mathematics
https://doi.org/10.1007/s10208-024-09656-9
Group-Invariant Max Filtering
Jameson Cahill1 · Joseph W. Iverson2 · Dustin G. Mixon3,4 · Daniel Packer3
Received: 27 October 2022 / Revised: 3 November 2023 / Accepted: 4 April 2024
© The Author(s) 2024
Abstract
Given a real inner product space V and a group G of linear isometries, we construct a
family of G-invariant real-valued functions on V that we call max filters. In the case
where V = Rd and G is finite, a suitable max filter bank separates orbits, and is even
bilipschitz in the quotient metric. In the case where V = L 2 (Rd ) and G is the group
of translation operators, a max filter exhibits stability to diffeomorphic distortion like
that of the scattering transform introduced by Mallat. We establish that max filters are
well suited for various classification tasks, both in theory and in practice.
Keywords Group invariance · Phase retrieval · Machine learning
Mathematics Subject Classification Primary 94A12; Secondary: 20G20 · 20-08 ·
42C20
1 Introduction
Modern machine learning has been extraordinarily successful in domains where large
volumes of labeled data are available [50, 68]. Indeed, highly expressive models can
generalize once they fit an appropriately large training set. Unfortunately, many important domains are plagued by a scarcity of data or by expensive labels (or both). One
way to bridge this gap is by augmenting the given dataset with the help of a large family
of innocuous distortions. In many cases, the distortions correspond to the action of a
group, meaning the ground truth exhibits known symmetries. Augmenting the trainCommunicated by Joan Bruna.
B
Dustin G. Mixon
1
Department of Mathematics and Statistics, University of North Carolina Wilmington,
Wilmington, NC, USA
2
Department of Mathematics, Iowa State University, Ames, IA, USA
3
Department of Mathematics, The Ohio State University, Columbus, OH, USA
4
Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
123
Foundations of Computational Mathematics
ing set by applying the group action encourages the model to learn these symmetries.
While this approach has been successful [28, 30, 50, 69], it is extremely inefficient
to train a large, symmetry-agnostic model to find a highly symmetric function. One
wonders:
Why not use a model that already accounts for known symmetries?
This motivates invariant machine learning (e.g., [6, 43, 72, 79, 81]), where the
model is invariant to underlying symmetries in the data. To illustrate, suppose an
object is represented by a point x in a set V , but there is a group G acting on V such
that the same object is also represented by gx ∈ V for every g ∈ G. This ambiguity
emerges, for example, when using a matrix to represent a point cloud or a graph, since
the representation depends on the labeling of the points or vertices. If we apply a
G-invariant feature map : V → F, then the learning task can be performed in the
feature domain F without having to worry about symmetries in the problem. If F is a
Euclidean space, then we can capitalize on the trove of machine learning methods that
rely on that particular structure. Furthermore, if separates the G-orbits in V , then
no information is lost by passing to the feature domain. In practice, V and F tend to
be vector spaces out of convenience, and G is frequently a linear group.
While our interest in invariants stems from modern machine learning, maps like
have been studied since Cayley established invariant theory in the nineteenth century
[27]. Here, we take V = Cd and G ≤ GL(V ), and the maps of interest consist of the
G-invariant polynomials C[V ]G . Hilbert [44] proved that C[V ]G is finitely generated
as a C-algebra in the special case where G is the image of a representation of SL(Ck ),
meaning one may take the feature domain F to be finite dimensional (one dimension
for each generator). Since G is not a compact subset of Cd×d in such cases, there may
exist distinct G-orbits whose closures intersect, meaning no continuous G-invariant
function can separate them; this subtlety plays an important role in Mumford’s more
general geometric invariant theory [58]. In general, the generating set of C[V ]G is
often extraordinarily large [38], making it impractical for machine learning applications. To alleviate this issue, there has been some work to construct separating sets
of polynomials [34–36], i.e., sets that separate as well as C[V ]G does without necessarily generating all of C[V ]G . For every reductive group G, there exists a separating
set of 2d + 1 invariant polynomials [21, 38], but the complexity of evaluating these
polynomials is still quite large. Furthermore, these polynomials tend to have high
degree, and so they are numerically unstable in practice. In practice, one also desires
a quantitative notion of separating so that distant orbits are not sent to nearby points
in the feature space, and this behavior is not always afforded by a separating set of
polynomials [21]. Despite these shortcomings, polynomial invariants are popular in
the data science literature due in part to their rich algebraic theory, e.g., [6, 9, 12, 21,
61].
In this paper, we focus on the case where V is a real inner product space and G is a
group of linear isometries of V . We introduce a family of non-polynomial invariants
that we call max filters. In Sect. 2, we define max filters, we identify some basic
properties, and we highlight a few familiar examples. In Sect. 3, we use ideas from
[39] to establish that 2d generic max filters separate all G-orbits when G is finite
(see Corollary 15), and then we describe various settings in which max filtering is
123
Foundations of Computational Mathematics
computationally efficient. In Sect. 4, we show that when G is finite, a sufficiently
large random max filter bank is bilipschitz with high probability; see Theorem 20.
This is the first known construction of invariant maps for a general class of groups
that enjoy a lower Lipschitz bound, meaning they separate orbits in a quantitative
sense. In the same section, we later show that when V = L 2 (Rd ) and G is the group
of translations, certain max filters exhibit stability to diffeomorphic distortion akin to
what Mallat established for his scattering transform in [54]; see Theorem 24. In Sect. 5,
we explain how to select max filters for classification in a couple of different settings,
we determine the subgradient of max filters to enable training, and we characterize
how random max filters behave for the symmetric group. In Sect. 6, we use max
filtering to process real-world datasets. Specifically, we visualize the shape space of
voting districts, we use electrocardiogram data to classify whether patients had a heart
attack, and we classify a multitude of textures. Surprisingly, we find that even in cases
where the data do not appear to exhibit symmetries in a group G, max filtering with
res (...truncated)