Group-Invariant Max Filtering (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-024-09656-9.pdf

Group-Invariant Max Filtering

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-024-09656-9 Group-Invariant Max Filtering Jameson Cahill1 · Joseph W. Iverson2 · Dustin G. Mixon3,4 · Daniel Packer3 Received: 27 October 2022 / Revised: 3 November 2023 / Accepted: 4 April 2024 © The Author(s) 2024 Abstract Given a real inner product space V and a group G of linear isometries, we construct a family of G-invariant real-valued functions on V that we call max filters. In the case where V = Rd and G is finite, a suitable max filter bank separates orbits, and is even bilipschitz in the quotient metric. In the case where V = L 2 (Rd ) and G is the group of translation operators, a max filter exhibits stability to diffeomorphic distortion like that of the scattering transform introduced by Mallat. We establish that max filters are well suited for various classification tasks, both in theory and in practice. Keywords Group invariance · Phase retrieval · Machine learning Mathematics Subject Classification Primary 94A12; Secondary: 20G20 · 20-08 · 42C20 1 Introduction Modern machine learning has been extraordinarily successful in domains where large volumes of labeled data are available [50, 68]. Indeed, highly expressive models can generalize once they fit an appropriately large training set. Unfortunately, many important domains are plagued by a scarcity of data or by expensive labels (or both). One way to bridge this gap is by augmenting the given dataset with the help of a large family of innocuous distortions. In many cases, the distortions correspond to the action of a group, meaning the ground truth exhibits known symmetries. Augmenting the trainCommunicated by Joan Bruna. B Dustin G. Mixon 1 Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC, USA 2 Department of Mathematics, Iowa State University, Ames, IA, USA 3 Department of Mathematics, The Ohio State University, Columbus, OH, USA 4 Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA 123 Foundations of Computational Mathematics ing set by applying the group action encourages the model to learn these symmetries. While this approach has been successful [28, 30, 50, 69], it is extremely inefficient to train a large, symmetry-agnostic model to find a highly symmetric function. One wonders: Why not use a model that already accounts for known symmetries? This motivates invariant machine learning (e.g., [6, 43, 72, 79, 81]), where the model is invariant to underlying symmetries in the data. To illustrate, suppose an object is represented by a point x in a set V , but there is a group G acting on V such that the same object is also represented by gx ∈ V for every g ∈ G. This ambiguity emerges, for example, when using a matrix to represent a point cloud or a graph, since the representation depends on the labeling of the points or vertices. If we apply a G-invariant feature map : V → F, then the learning task can be performed in the feature domain F without having to worry about symmetries in the problem. If F is a Euclidean space, then we can capitalize on the trove of machine learning methods that rely on that particular structure. Furthermore, if separates the G-orbits in V , then no information is lost by passing to the feature domain. In practice, V and F tend to be vector spaces out of convenience, and G is frequently a linear group. While our interest in invariants stems from modern machine learning, maps like have been studied since Cayley established invariant theory in the nineteenth century [27]. Here, we take V = Cd and G ≤ GL(V ), and the maps of interest consist of the G-invariant polynomials C[V ]G . Hilbert [44] proved that C[V ]G is finitely generated as a C-algebra in the special case where G is the image of a representation of SL(Ck ), meaning one may take the feature domain F to be finite dimensional (one dimension for each generator). Since G is not a compact subset of Cd×d in such cases, there may exist distinct G-orbits whose closures intersect, meaning no continuous G-invariant function can separate them; this subtlety plays an important role in Mumford’s more general geometric invariant theory [58]. In general, the generating set of C[V ]G is often extraordinarily large [38], making it impractical for machine learning applications. To alleviate this issue, there has been some work to construct separating sets of polynomials [34–36], i.e., sets that separate as well as C[V ]G does without necessarily generating all of C[V ]G . For every reductive group G, there exists a separating set of 2d + 1 invariant polynomials [21, 38], but the complexity of evaluating these polynomials is still quite large. Furthermore, these polynomials tend to have high degree, and so they are numerically unstable in practice. In practice, one also desires a quantitative notion of separating so that distant orbits are not sent to nearby points in the feature space, and this behavior is not always afforded by a separating set of polynomials [21]. Despite these shortcomings, polynomial invariants are popular in the data science literature due in part to their rich algebraic theory, e.g., [6, 9, 12, 21, 61]. In this paper, we focus on the case where V is a real inner product space and G is a group of linear isometries of V . We introduce a family of non-polynomial invariants that we call max filters. In Sect. 2, we define max filters, we identify some basic properties, and we highlight a few familiar examples. In Sect. 3, we use ideas from [39] to establish that 2d generic max filters separate all G-orbits when G is finite (see Corollary 15), and then we describe various settings in which max filtering is 123 Foundations of Computational Mathematics computationally efficient. In Sect. 4, we show that when G is finite, a sufficiently large random max filter bank is bilipschitz with high probability; see Theorem 20. This is the first known construction of invariant maps for a general class of groups that enjoy a lower Lipschitz bound, meaning they separate orbits in a quantitative sense. In the same section, we later show that when V = L 2 (Rd ) and G is the group of translations, certain max filters exhibit stability to diffeomorphic distortion akin to what Mallat established for his scattering transform in [54]; see Theorem 24. In Sect. 5, we explain how to select max filters for classification in a couple of different settings, we determine the subgradient of max filters to enable training, and we characterize how random max filters behave for the symmetric group. In Sect. 6, we use max filtering to process real-world datasets. Specifically, we visualize the shape space of voting districts, we use electrocardiogram data to classify whether patients had a heart attack, and we classify a multitude of textures. Surprisingly, we find that even in cases where the data do not appear to exhibit symmetries in a group G, max filtering with res (...truncated)