Multiple Cayley-Klein metric learning

PLOS ONE, Sep 2017

As a specific kind of non-Euclidean metric lies in projective space, Cayley-Klein metric has been recently introduced in metric learning to deal with the complex data distributions in computer vision tasks. In this paper, we extend the original Cayley-Klein metric to the multiple Cayley-Klein metric, which is defined as a linear combination of several Cayley-Klein metrics. Since Cayley-Klein is a kind of non-linear metric, its combination could model the data space better, thus lead to an improved performance. We show how to learn a multiple Cayley-Klein metric by iterative optimization over single Cayley-Klein metric and their combination coefficients under the objective to maximize the performance on separating inter-class instances and gathering intra-class instances. Our experiments on several benchmarks are quite encouraging.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

Multiple Cayley-Klein metric learning

September Multiple Cayley-Klein metric learning Yanhong Bi 0 1 Bin Fan 0 1 Fuchao Wu 0 1 0 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences , Beijing , P.R.China , 2 University of Chinese Academy of Sciences , Beijing , P.R.China 1 Editor: Chenping Hou, National University of Defense Technology , CHINA As a specific kind of non-Euclidean metric lies in projective space, Cayley-Klein metric has been recently introduced in metric learning to deal with the complex data distributions in computer vision tasks. In this paper, we extend the original Cayley-Klein metric to the multiple Cayley-Klein metric, which is defined as a linear combination of several Cayley-Klein metrics. Since Cayley-Klein is a kind of non-linear metric, its combination could model the data space better, thus lead to an improved performance. We show how to learn a multiple Cayley-Klein metric by iterative optimization over single Cayley-Klein metric and their combination coefficients under the objective to maximize the performance on separating interclass instances and gathering intra-class instances. Our experiments on several benchmarks are quite encouraging. - Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: This work is supported by the National Natural Science Foundation of China (No. 61375043, 61472119 and 61672032). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction An effective distance metric is of great importance for many computer vision and pattern recognition applications such as clustering [ 1 ], retrieval [ 2, 3 ] and classification [ 4, 5 ]. Researches have shown that the widely used Euclidean metric mainly performs well under isotropic assumption of the data space. Therefore, its performance is usually limited since it can not reasonably reflect the underlying relationships between input instances [6±9]. To take the correlation among different data dimensions into consideration, using Mahalanobis metric is a popular solution. Due to the difficulty in designing a specific Mahalanobis metric for a specific task, learning a Mahalanobis-like distance metric from labeled data attracts a growing attention over the last years [ 10, 11 ]. The underlying idea of Mahalanobis metric learning is to define an application dependent metric which could capture the characteristics of the data. It aims to learn a positive semi-definite (PSD) matrix to define a specific Mahalanobis metric, i.e., d2(x, y) = (x − y)T M(x − y). Different learning objectives have been proposed in the literature, for example, to maximize the distances between dissimilar samples and simultaneously constrain the distances between similar samples [ 12 ], or to maximize the margin between similar pairs and dissimilar pairs [11]. Although Mahalanobis metric learning has been successfully applied in many applications, it is actually a linear metric. However, it is widely believed that the high dimensional data space encountered in computer vision applications is essentially non-linear. Therefore, researchers have resorted to more complicated non-linear metrics to pursue a higher performance. These attempts include local metric learning [11, 13±16], kernel metric learning [17] and the most recently proposed Cayley-Klein metric learning [ 18 ], etc. This paper follows the work of Cayley-Klein metric learning in [ 18 ]. A multiple CayleyKlein metric learning method is proposed. It effectively learns several Cayley-Klein metrics and their linear combination weights to form a powerful non-linear metric. Each of the combined Cayley-Klein metric is focused on a part of the data space and can be considered as a locally optimized metric on a part of the training data. To achieve this goal, we first partition the training data into different clusters according to their label information. Each cluster is assigned with a local Cayley-Klein metric, whose learning optimization is conducted only on the training data from the related cluster. Once these Cayley-Klein metrics have been learned, their combination weights are optimized by maximizing the distances between inter-class instances and simultaneously restricting the distances between intra-class instances smaller than an upper bound. By combining these local metrics together, it effectively leads to a more powerful and global metric for the whole data space. The local Cayley-Klein metrics and their weights are iteratively optimized towards a high classification performance distance metric. Related work Metric learning In this section, we will first review some related work under the topic of metric learning. Then, we move to a brief introduction to the Cayley-Klein geometries as a basis of our method. When the general Euclidean distance can not fulfill the requirement of many computer vision applications, it is straight-forward to explore the label information and the intrinsic structure of training data to learn a specific but more powerful distance metric for a given task. Most works in the literature have been focused on the Mahalanobis metric learning. The earlier work for Mahalanobis metric learning is the MMC proposed by Xing et al. [ 12 ]. It aims to learn a positive semi-definite metric matrix by maximizing the distances between instances from different classes while restricting the distances between instances from a same class smaller than a fixed upper bound. Based on this objective, they finally formulated the metric learning problem as a convex optimization problem which is solved by semidefinite programming. Similar objective has been used in Davis et al. [ 10 ] as constraints. Subject to these constraints, Davis et al. proposed the Information Theory Metric Learning (ITML) by minimizing the differential relative entropy. Instead of restricting the intra-class distances below an upper bound, Globerson and Roweis [ 19 ] proposed to make them as zero. Guillaumin et al. [ 20 ] proposed a discriminative linear logistic regression for Mahalanobis metric learning. Other famous works include the LMNN [11], which tried to learn a Mahalanobis distance metric so as to make the k-nearest neighbors always lie in the same class while instances from different classes are separated by a large margin. By replacing the exponential loss in LMNN with the hinge loss, Shen et al. [ 21 ] proposed the BoostMetric. They further proposed the FrobMetric by adding a general Frobenius norm as a regularization term to the objective function [ 22 ]. More recently, Lu et al. [ 23 ] proposed a neighborhood repulsed metric learning method for kinship verification. Their target is to learn a distance metric so that the intra-class samples are pulled as close as possible and inter-class samples lying in a neighborhood are repulsed and pushed away as far as possible. Wang et al. [24] proposed the Shrinkage Expansion Adaptive Metric Learning (SEAML). Their method could adaptively adjust the bound constraints used in previous works [ 10, 12 ] by shrinking the distances between samples of similar pairs and expanding the distances between samples of dissimilar pairs. Law et al. [25] proposed the Fantope regularization and applied it to the Mahalanobis metric learning. 2 / 15 Beyond Mahalanobis metric learning, a lot of researchers have also made a big effort to nonMahalanobis metric learning due to its potential in dealing with more complex intra- and interclass variations. Kernel trick is the most straight-forward technique to deal with non-linearity, so it is naturally to use kernel method in metric learning, such as [ 17, 26 ]. Non-Euclidean spaces such as Riemannian space, projective space have also been explored for metric learning. These methods include Riemannian and manifold metric learning [ 27, 28 ] and Cayley-Klein metric learning [18]. In [ 27 ], Cheng proposed the Riemannian similarity learning by tackling the metric learning problem in a Riemannian optimization framework. In [ 18 ], Bi et al. shown that CayleyKlein metric can be incorporated into the metric learning frameworks of MMC [ 12 ] and LMNN [11] to obtain a better distance metric. Besides, Li et al. [ 29 ] proposed a margin based method to learn a second-order discriminant function as distance metric for verification problem. Some researchers have embedded metric learning into the framework of deep neural networks [ 30, 31 ]. Since our method learns several Cayley-Klein metrics locally and combines them together for a global and powerful distance metric, it is mostly related to the local metric learning [ 11, 13, 15, 32 ] and some mixed/compositional metric learning methods [ 16, 33 ]. MM-LMNN [11] is an extension of LMNN which learns a small number of metrics (typically one per class) in an effort to alleviate overfitting. Noh et al. [ 32 ] pointed out that finite sampling using the class conditional probability distribution leads to a theoretical bias of the nearest neighbor classifier. Thus they proposed the Generative Local Metric Learning (GLML) using local metrics to limit this theoretical bias. In [13], Wang et al. introduced a local metric learning method based on finite number of linear metrics named PLML. They used the k-means algorithm to define some anchor points as the means of clusters and optimized a combination of metric bases learned from these clusters. Reduced-Rank Local Metric Learning (R2LML) proposed in [ 15 ] learns k Mahalanobis-like local metrics that are then conically combined. Additionally, a nuclear norm regularizer is adopted to obtain low-rank weight matrices for calculating metrics, which is able to control the rank of the involved linear mappings through a sparsityinducing matrix norm. Recently, Semerci and Alpaydin [ 16 ] proposed the Mixture of LMNN (MoLMNN) method to learn a mixture of local Mahalanobis distances to better discriminate the data. It needs a gating function to softly partition the input space into several regions. In [ 33 ], SCML-local aims to learn a sparse combination of locally discriminative metrics. This algorithm do not need to perform projections onto the PSD cone, thus getting a computational advantage for high-dimensional problems. Different from these methods, the proposed multiple Cayley-Klein metric learning linearly combines several local Cayley-Klein metrics while most previous methods combine Mahalanobis metrics. Due to the intrinsic non-linearity of Cayley-Klein metric, combining them is more effective than combining linear metrics like Mahalanobis metrics. Thus, our method is potentially to have a better performance than previous methods. Moreover, contrast to the sophisticated methods in the previous works for partitioning the input data space into several clusters for local metrics learning, we use a simpler and straight-forward method by directly utilizing the label information supplied with the training data. Cayley-Klein geometries Cayley-Klein geometries are branches of non-Euclidean geometry, which is an ancient topic in geometry and can be traced back to the 19th century. Among many mathematicians who conducted research on this topic, there were A. Cayley and F. Klein. In 1859, A. Cayley discovered that Euclidean geometry can be considered as a special case of projective geometry which leads to his famous statement ªdescriptive geometry (his term for projective geometry) is all geometryº [ 34 ]. Ten years later, F. Klein [ 35, 36 ] followed A. Cayley's ideas and showed that the 3 / 15 projective geometry can provide a framework for the development of hyperbolic and elliptic geometries as well. His research is mainly focused on the real Euclidean, hyperbolic and elliptic geometries since he believed that only these geometries can describe the physical universe [37]. Based on their researches, it is acknowledged that the Euclidean, the hyperbolic and the elliptic geometries are independent and self-subsistent geometries. Their research also leads to working models for these different geometries. Owing to their distinguished work on this topic, both the hyperbolic and elliptic geometries are called Cayley-Klein geometries. They occupy a significant position in the foundations of geometry, because of their distinguished position as geometries of constant curvature. Nowadays, the term ªnon-Euclidean geometryº is frequently used to refer the hyperbolic geometry only [38] or the hyperbolic and elliptic geometries together [39]. The reason of calling them ªnon-Euclideanº is perhaps that no other non-Euclidean geometry had been discovered earlier, and also for which, they both violate the parallel postulate of Euclidean geometry. In Euclidean geometry, for each tangent to a circle there is a unique second parallel tangent. That is to say, there is a unique line through a fixed point in parallel with a given line (not through the fixed point). Whereas in elliptic geometry, there are no parallels at all. As great circles are taken to be lines in elliptic geometry, two different lines in one plane always intersect. In hyperbolic geometry, through one point not on the given line, there are infinitely many parallels to this line. Methods Cayley-Klein metric According to [ 34, 35 ], Cayley-Klein metric is defined over an invertible symmetric matrix G in projective space. Mathematically, the Cayley-Klein distance between two data points xi; xj 2 Rn in n-dimensional space is defined as: 0 k Bsxixj ‡ dCK…xi; xj; G† ˆ 2 log @ sxixj q1 qsxjxjAC …k > 0† sx2ixj sxixi sx2ixj sxixi sxjxj where s…xi; xj† ˆ …xiT ; 1†G x1j ≜ sxixj …1† …2† k is a parameter related to the space curvature [ 18 ]. Apparently, there is one-to-one correspondence between the symmetric matrix G 2 R…n‡1† …n‡1† and the Cayley-Klein metric, i.e., a specific G defines a specific kind of CayleyKlein metric. For this reason, G is called the Cayley-Klein metric matrix. Depending on whether G is positive definite or indefinite, there exist two kinds of Cayley-Klein metric. When G is positive definite, dCK is an elliptic Cayley-Klein metric. Otherwise, dCK is a hyperbolic Cayley-Klein metric. Bi et al. [ 18 ] have shown that a special form of Cayley-Klein metric could approach Mahalanobis metric in an extreme case. For this reason, they also call it the generalized Mahalanobis metric. In their work, two specific metric learning methods have been proposed for learning data-dependent Cayley-Klein metric matrix. Multiple Cayley-Klein metrics In many computer vision tasks, it is expected that data points from same class are localized near each other in the feature space, while data points from different classes are far from each other. On one hand, a distance metric learned for one class may not perform well when 4 / 15 applying to another class. On the other hand, a single distance metric learned on data from all classes is usually incompetent to model the multiclass decision boundaries due to the complexity of high dimensional data space. Based on these reasons, we propose the multiple CayleyKlein metric. It combines multiple Cayley-Klein metrics that are trained on different parts of the training set. Since Cayley-Klein metric is a kind of non-linear metric, combining several metrics could enlarge its non-linearity, thus leading to a better performance. The definition of multiple Cayley-Klein metric is simple, X N cˆ1 X c dmCK…xi; xj† ˆ acdCK …xi; xj; Gc†…ac > 0; ac ˆ 1† …3† Essentially, it linearly combines N different Cayley-Klein metrics, so it fulfills the metric axioms as well. Note that dCK(xi, xj; Gc) is a Cayley-Klein metric learned on the c-th data cluster. When the label information is available in the training data, we cluster training data by their labels. In other words, dCK(xi, xj; Gc) is learned to maximize the performance related to the cth class. For example, making the distance between any two instances in the c-th class small and the distance between instance in the c-th class and instance from other classes large. In this case, N is set equal to the number of classes. If the label information is unavailable, the training data can be partitioned into N clusters by any unsupervised clustering method, such as k-means. In this paper, we only focus on the supervised case as the purpose of metric learning is to leverage metric's performance by using labeled training data. Fig 1 illustrates the basic idea of the proposed multiple Cayley-Klein metric learning method by a toy example. There are two classes of data in (a) denoted by squares and circles, three classes of data in (b) denoted by squares, circles and triangles respectively. In situation (a), we can see that using a non-linear metric achieves the same goal as using two linear metrics in data classification. While in situation (b), a single non-linear metric is not enough, it would need at least two non-linear metrics or even more linear ones to separate the data. Therefore, multiple Cayley-Klein metrics actually correspond to a series of Riemannian metrics with several different (but fixed) curves, which we expect to model more complex data distribution. In the following, we will describe the formulation of multiple Cayley-Klein metric learning, and then elaborate how to optimize the objective function. Metric learning Suppose we have a training set of N classes. According to the label information, we organize it into N sets of similar pairs S ˆ fSc; c ˆ 1; 2; ; Ng and N sets of dissimilar pairs D ˆ fDc; c ˆ 1; 2; ; Ng. In Sc, it is constituted by samples from the c-th class. While in Dc, it contains pairs of dissimilar samples, one of which from the c-th class, and the other from the j-th class, j 6ˆ c. Following the widely used learning criteria in metric learning community, we formulate our objective as follows: …4† Our objective is to learn a multiple Cayley-Klein metric such that the distances of dissimilar pairs as max as possible, while in the meantime restricting the distances of similar pairs to be smaller than 1. Directly optimize the above problem is difficult. Here, we propose to optimize αc and Gc alternatively. Optimize α. Given N Cayley-Klein matrices Gc, the problem to solve αc is formulated as: XN X maximize ac;cˆ1;2; ;N ac ˆ 1 dCK …xi; xj; Gc†† 1 …5† Fig 1. Intuitive illustration of multiple Cayley-Klein metrics and single Cayley-Klein metric by a toy example. (a) non-linear metric VS. linear metrics. (b) multiple non-linear metrics VS. single non-linear metric. 6 / 15 Such a linear programming problem is easy to solve. Note that concatenating all sets of dissimilar pairs Dc; c ˆ 1; 2; ; N contains duplicated pairs. D0 is the set of dissimilar pairs after removing duplicated pairs from D. Optimize Gc. Once the weights are fixed, the problem in (4) could be separated into N subproblems, which are solved one by one. For the c-th sub-problem, it is: X maxGi mcize …xi;xj†2DcaXcdCK…xi; xj; Gc† subject to …a† ac dCK…xi; xj; G † c …xi;xj†2XSc ‡ X ap p6ˆc …xi;xj†2Sp …b† Gc > 0 dCK…xi; xj; G † p 1 Since matrix Gc in the objective is symmetric, it is convenient to optimize on Lc after Cholesky decomposition Gc ˆ LcT Lc with Lc 2 R…n‡1† …n‡1†. In this way, the above problem can be solved by the gradient ascend algorithm. At each iteration, we take a gradient ascent step on the objective function P ε…Lc† ˆ xi;xj†2Dc acdCK…xi; xj; Lc† with respect to Lc. By applying the Cholesky decomposition on Gc, constraint (b) is satisfied. Then we just need to approximate the updated Lc to fulfill the constraint (a). Specifically, given an updated Lc, its approximated L0 that meets the constraints (a) can be obtained by the following minimization problemml: minimizeL0 k LX0LckF subject to ac dCK…xi; xj; L0† …xi;xj†2Sc X X ‡ ap p6ˆc …xi;xj†2Sp dCK…xi; xj; Lp† 1 For simplicity, we denote Cxixj ˆ …xiT ; 1†T …xjT ; 1†, then: s…xi; xj† ˆ tr…Cxixj Gc† ˆ tr…Cxixj …LcT Lc†† Suppose the matrix Lc at the t-th iteration is Lt, we can compute the gradient of the objective function at the t-th iteration as: Lt ˆ ε@…LLcc† jLt ˆ 2ki 2Lc ac X …xi;xj†2Dc 0 2Cij B@q si2j siisjj 1 sii s Cii qij si2j siisjj sjj sijCjj qAC si2j siisjj Initialization. To start the alternative optimization procedure described above, we have to initialize αc and Gc in a reasonable way. Bi et al. [ 18 ] have proposed a specific method to construct a Cayley-Klein matrix from a given dataset, which is called the generalized Mahalanobis matrix. They have experimentally shown a better performance of initialization using the generalized Mahalanobis matrix compared to using an identity matrix or a random matrix. …6† …7† …8† …9† …10† Therefore, we also choose to use the generalized Mahalanobis matrix to initialize Gc. Since Gc is a local metric mainly focused on the c-th class, we use the mean m(c) and inverse covariance S(c) computed from samples of the c-th class. In this way, we initialize Gc with the following matrix: Gc ˆ S…c† S…c†m…c† m…c†TS…c† m…c†TS…c†m…c† ‡ k…c†2 ! …k…c† > 0† …11† For αc, it is simply initialized as 1/N. Combining all the above issues together, we summarize the proposed Multiple CayleyKlein Metric Learning (MCKML) algorithm as follows: Algorithm 1. Multiple Cayley-Klein Metric Learning (MCKML) Input: classes of labeled training data (organized into sets of similar pairs and sets of dissimilar pairs), convergence error . Output: αc,Gc, c = 1, 2, , N Begin 1. Set α = 1/N and Gc according to Eq (11) 2. Optimize α by solving (5) with linear programming. 3. for c = 1 to N do 4. Optimize Gc by solving (6). 5. end for 6. Repeat 2±5 until PcNˆ1 jGcupdate Gcpreviousj < , where Gcupdate and Gcprevious denote the updated and the previous Gc respectively. 7. return α and Gc End Experiments In this section, we evaluate the proposed method on image classification tasks with three different public datasets. For comparison, we also tested the performance of CK-MMC and MMC as they share an identical learning target as our method. Their difference only lies in the definition of distance metric. Moreover, LMNN and CK-LMNN have been evaluated due to their good performance. Additionally, MM-LMNN and SCML-local also have been tested as they are typical local metric learning methods. Results on the UCI datasets Datasets: In this experiment, we use 9 different datasets from the UCI Machine Learning Repository at, which are widely used in evaluating metric learning methods. These datasets include: Wine, Ionosphere, Vowel, Balance, Pima, Vehicle, Segmentation, Waveform and Letter. The characteristics of each UCI dataset such as the number of data points, feature dimensions, and the number of classes are summarized in Table 1. Set up: For each dataset, we randomly divide it into training/validation/test sets. The numbers of samples in the training/validation/test subsets are shown in Table 1, and the proportion of these three subsets is nearly 60%/20%/20%. All features are first normalized over the 8 / 15 training data to have zero mean and unit variance. Features of the validation and test data are normalized using the mean and variance of training data. The parameters of all methods are set by authors' recommendation. LMNN, MM-LMNN and CK-LMNN use 3 target neighbors and all imposters, while these are set to 3 and 10 in SCML-Local. The k-nearest neighbor (kNN) classifier is used for classification, and we set k = 3 for all the datasets. We repeat this procedure 10 times and report the average accuracies for these datasets. Results: Table 2 shows the classification accuracies for the seven evaluated methods. Consistent to the previous work, the performance is improved by using Cayley-Klein metric to replace the traditional Mahalanobis metric. This point can be read from ªCK-MMC VS. MMCº and ªCK-LMNN VS. LMNNº. Among all the evaluated methods, the proposed MCKML performs the best on 6 out of 9 datasets. For two datasets (Balance and Letter), it performs the second best and closely follows the best result (SCML-local). Note that CK-LMNN, MM-LMNN and SCML-local use a learning target based on triplets of samples that is more powerful than the learning target based on pairs of samples, which is used in MCKML. When considering the same learning target, MCKML consistently improves over MMC and CK-MMC on all datasets. By incorporating MCKML to the learning paradigm of LMNN, it is expected to further improve its performance. We will leave this as our future work. For more accurate comparison, we perform paired t-test with significance level 0.05 to statistically evaluate which result is better. The comparison results with CK-MMC, CK-LMNN and two local metric learning methods (MM-LMNN and SCML-local) are summarized in Table 3. We use ª*ºto indicate the classification results of the two methods are not Paired t-test CKMMC < CKLMNN * MMLMNN < SCML-local * MCKML CKMMC < CKLMNN < MMLMNN < SCML-local * MCKML CKMMC < SCML-local * MMLMNN < CKLMNN * MCKML CKMMC < CKLMNN < MMLMNN < MCKML < SCML-local SCML-local < CKLMNN * MMLMNN * CKMMC < MCKML CKLMNN * CKMMC * MMLMNN < SCML-local < MCKML CKMMC * CKLMNN * MMLMNN * SCML-local < MCKML CKMMC < MCKML * CKLMNN * MMLMNN * SCML-local CKMMC < CKLMNN < MMLMNN * MCKML * SCML-local significantly different for the given confidence level, and ª<º to indicate that the mean of the classification accuracy of the latter method is statistically higher than that of the former one. From the paired t-test results, we can conclude with a 95% confidence level that the proposed MCKML generally outperforms CK-MMC and is comparable with or even better than CK-LMNN, MM-LMNN and SCML-local on all datasets except Balance dataset. Visualization of the learned metric: In order to provide a better understanding of why the proposed MCKML works well and further show the necessity (benefit) of enlarging non-linear property, we added a graphical illustration using t-SNE [ 40 ] on the Segmentation dataset with MMC, CK-MMC and MCKML. In the first row of Fig 2, we can see that although CK-MMC improves MMC, MCKML obtains further improvement. Under the metric obtained by MCKML, the distributions of different classes (denoted by different colors) are more concentrated. Meanwhile, each class is far from other classes and the boundaries are more clear and legible. The second row shows that the metrics consistently generalize to test data. Such a visualization validates the necessity to use the Cayley-Klein metric as well as the multiple CayleyKlein metric. Results on the PubFig dataset Dataset: Public Figure Face Database (PubFig) [ 41 ] is a challenging real-world face database collected from the internet. It contains 200 people and has a total number of 58,797 images of them. The images in this database are taken in completely uncontrolled situations with noncooperative subjects, leading to large variations in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. Similar to [ 18, 25 ], our experiment uses a subset of PubFig, containing 772 images from 8 identities, including Alex Rodriguez (Alex), Clive Owen (Clive), Hugh Laurie (Hugh), Jared Leto (Jared), Miley Cyrus (Miley), Scarlett Johansson (Scarlett), Viggo Mortensen (Viggo) and Zac Efron (Zac). We use 11-dimensional relative attributes [42] to represent each image in the dataset. The relative attributes are computed from a concatenation of the 512-dimensional GIST descriptor [ 43 ] and a 45-dimensional LAB color histogram. We use the publicly available codes of [ 42 ] to compute relative attributes. Set up: For all the evaluated methods, we randomly select 30 images per class for training, 30 images per class for validation, and use the remaining images for testing. In the test stage, we use a 3-NN classifier based on the learned distance metric. We repeat this procedure 10 times and report the average classification accuracies. Results: The results are listed in Table 4. We could obtain similar observations as in the UCI datasets: MCKML outperforms MMC and CK-MMC in all cases, while it is slightly inferior to CK-LMNN in some categories (the reason has been explained in the last subsection). 10 / 15 Fig 2. Illustrative experiment on Segmentation dataset in 2D. (a)*(c) Distributions of training points under the learned metrics (MMC, CK-MMC and MCKML) respectively. (d)*(f) Distributions of test points under the learned metrics (MMC, CK-MMC and MCKML) respectively. Moreover, two local metric learning methods MM-LMNN and SCML-local, which all use a set of triplet constraints as LMNN, perform better than LMNN while comparable to CK-LMNN. When comparing the overall performance, MCKML is the best. By comparing the results of MCKML to those of CK-MMC and CK-LMNN, it is clear that learning multiple Cayley-Klein metrics does improve the performance of learning a single Cayley-Klein metric. Although Cayley-Klein metric learning already improves the traditional Mahalanobis metric learning, the multiple Cayley-Klein metric learning further improves its performance. Results on the OSR dataset Dataset: Outdoor Scene Recognition Dataset (OSR) [ 43 ] contains 2688 images from 8 outdoor scene categories: tall buildings (B), inside city (IC), street (S), highways (H), coast (C), open country (OC), mountain (m) and forest (F). We use the 6-dimensional relative attributes generated from 512-dimensional GIST descriptors to represent the images. Set up: As in the experiment on the PubFig dataset, we randomly select 30 images per class for metric learning, 30 images per class for validation, and use the remaining images to test the performance of the learned metric. 3-NN classifier is used for classification. We repeat this procedure 10 times and report the average classification accuracies. 11 / 15 SCML-local [ 33 ] 82.1 83.0 83.4 80.2 78.9 85.2 79.1 84.9 82.1±1.1 Results: The classification results on the OSR dataset are listed in Table 5. Owing to the more powerful learning objective based on triplets of samples, LMNN/CK-LMNN outperforms MMC/CK-MMC respectively in all categories. In average, there is over 2% improvement. Under the same learning objective, we can see that using Cayley-Klein metric (CK-MMC) outperforms using Mahalanobis metric (MMC) by 3%. The performance of Cayley-Klein metric is further improved by the proposed multiple Cayley-Klein metric by additionally 3%. The local metric learning methods MM-LMNN and SCML-local outperform the original LMNN while inferior to CK-LMNN. Finally, we can find that the results in Tables 2, 4 and 5 are rather consistent, although these datasets are fundamentally different from each other. Among all the tested methods, the proposed MCKML achieves the best average classification and only slightly inferior to CK-LMNN which uses a more powerful learning objective based on triplets. When using the same objective based on pairs of samples, our method outperforms previous methods on all tested categories. Table 6 shows the running times on OSR and PubFig for different methods, which are average results of 10 runs. Generally speaking, using Cayley-Klein metric requires a litter more time in testing as more operations are involved in computing Cayley-Klein metric according to its definition. While for training, compared with MMC and CK-MMC, which all need one loop of gradient ascending to find the optimal solution, MCKML needs two loops that is time consuming. One is the outer loop optimized alternatively on α and the Cayley-Klein matrices MCKML 76.3 48.6 75.9 77.9 71.9 61.4 68.4 88.9 71.2±1.0 Gc, while the other is the inner loop for solving Gc by gradient ascending identical to CK-MMC. When compared with the other two local metric learning methods, MM-LMNN and SCML-local are more efficient than MCKML. Conclusion This paper follows a very recent work of Cayley-Klein metric learning, which is a first paper introducing the ancient Cayley-Klein geometries in computer vision. We show in this paper that Cayley-Klein metric can benefit from learning multiple local Cayley-Klein metrics, each of which is only focused on a part of the data space. To this end, we propose the multiple Cayley-Klein metric learning method, which alternatively optimizes over the local Cayley-Klein metrics and their global combination weights. Although the metric learning target is identical to some previous works, i.e., to maximize the inter-class distances and restrict the intra-class distances to be less than an upper bound, our method results in a better performance on three widely used datasets as shown in the experiments. These results demonstrate the superiority of multiple Cayley-Klein metric learning to the Cayley-Klein metric learning, as well as the traditional Mahalanobis metric learning and the state-of-art local metric learning. Acknowledgments This work is supported by the National Natural Science Foundation of China (No. 61375043, 61472119 and 61672032). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Author Contributions Conceptualization: Fuchao Wu. Data curation: Yanhong Bi. Formal analysis: Yanhong Bi, Bin Fan, Fuchao Wu. Funding acquisition: Bin Fan, Fuchao Wu. Investigation: Yanhong Bi. Methodology: Yanhong Bi, Bin Fan, Fuchao Wu. Resources: Fuchao Wu. Software: Yanhong Bi. Supervision: Bin Fan, Fuchao Wu. 13 / 15 Writing ± original draft: Yanhong Bi. Writing ± review & editing: Bin Fan. 14 / 15 1. MacQueen JB . On convergence of k-means and partitions with minimum average variance . The Annals of Mathematical Statistics . 1965 ;. 2. Chechik G , Sharma V , Shalit U , Bengio S . Large scale online learning of image similarity through ranking . Journal of Machine Learning Research . 2010 ; 11 : 1109 ± 1135 . 3. Frome A , Singer Y , Sha F , Malik J . Learning globally-consistent local distance functions for shapebased image retrieval and classification . In: ICCV; 2007 . p. 1 ± 8 . 4. Cover T , Hart P . Nearest neighbor pattern classification . IEEE Transactions on Information Theory . 1967 ; 13 ( 1 ): 21 ± 27 . 1967 .1053964 5. Lim D , McFee B , Lanckriet G . Robust structural metric learning . In: ICML . vol. 28 ; 2013 . p. 615 ± 623 . 6. Hastie T , Tibshirani R . Discriminant adaptive nearest neighbor classification . IEEE Transactions on Pattern Analysis and Machine Intelligence . 1996 ; 18 ( 6 ): 607 ± 616 . 7. Domeniconi C , Gunopulos D . Adaptive nearest neighbor classification using support vector machines . In: NIPS; 2002 . p. 665 ± 672 . 8. Peng J , Heisterkamp D , Dai H . Adaptive kernel metric nearest neighbor classification . In: International Conference on Pattern Recognition . vol. 3 ; 2002 . p. 33 ± 36 . 9. Goldberger J , Roweis S , Hinton G , Salakhutdinov R . Neighbourhood components analysis . In: NIPS; 2004 . p. 513 ± 520 . 10. Davis JV , Kulis B , Jain P , Sra S , Dhillon IS . Information-theoretic metric learning . In: ICML; 2007 . p. 209 ± 216 . Weinberger K , Saul L. Distance metric learning for large margin nearest neighbor classification . Journal of Machine Learning Research . 2009 ; 10 : 207 ± 244 . 12. Xing E , Ng A , Jordan M , Russell S. Distance metric learning, with application to clustering with sideinformation . In: NIPS; 2002 . p. 505 ± 512 . Wang J , Woznica A , Kalousis A . Parametric local metric learning for nearest neighbor classification . In: NIPS; 2012 . p. 1610 ± 1618 . 14. Xiong C , Johnson D , Xu R , Corso JJ . Random forests for metric learning with implicit pairwise position dependence . In: KDD; 2012 . p. 958 ± 966 . 15. Huang Y , Li C , Georgiopoulos M , Anagnostopoulos GC . Reduced-rank local distance metric learning . In: ECML PKDD ; 2013 . p. 224 ± 239 . 16. Semerci M , Alpaydin E. Mixtures of large margin nearest neighbor classifiers . In: ECML PKDD; 2013 . p. 675 ± 688 . Wang J , Do HT , Woznica A , Kalousis A . Metric Learning with Multiple Kernels . In: NIPS; 2011 . p. 1170 ± 1178 . 18. Bi Y , Fan B , Wu F. Beyond Mahalanobis Metric: Cayley-Klein Metric Learning . In: CVPR; 2015 . p. 2339 ± 2347 . 19. Globerson A , Roweis S . Metric Learning by Collapsing Classes . In: NIPS; 2005 . p. 451 ± 458 . 20. Guillaumin M , Verbeek J , Schmid C . Is that you? Metric learning approaches for face identification . In: ICCV; 2009 . p. 498 ± 505 . 21. Shen C , Kim J , Wang L , van den Hengel A. Positive Semidefinite Metric Learning with Boosting . In: NIPS; 2009 . p. 1007 ± 1036 . 22. Shen C , Kim J , Wang L . A scalable dual approach to semidefinite metric learning . In: CVPR; 2011 . p. 2601 ± 2608 . 23. Lu J , Zhou X , Tan YP , Shang Y , Zhou J . Neighborhood repulsed metric learning for kinship verification . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2014 ; 36 ( 2 ): 331 ± 345 . 10.1109/TPAMI. 2013 .134 PMID: 24356353 Wang Q , Zuo W , Zhang L , Li P. Shrinkage Expansion Adaptive Metric Learning . In: ECCV; 2014 . p. 25. Law MT , Thome N , Cord M. Fantope regularization in metric learning . In: CVPR; 2014 . p. 1051 ± 1058 . 26. Yeung DY , Chang H. A kernel approach for semisupervised metric learning . IEEE Transactions on Neural Networks . 2007 ; 18 ( 1 ): 141 ± 149 . 2006 .883723 PMID: 17278468 27. Cheng L. Riemannian similarity learning . In: ICML; 2013 . p. 540 ± 548 . 28. Huang Z , Wang R , Shan S , Chen X . Learning Euclidian-to-Riemannian metric for point-to-set classification . In: CVPR; 2014 . p. 1677 ± 1684 . 29. Li Z , Cao L , Chang S , Smith JR , Huang TS . Beyond Mahalanobis distance: Learning second-order discriminant function for people verification . In: CVPR Workshops; 2012 . p. 45 ± 50 . 30. Hu J , Lu J , Tan YP . Discriminative Deep Metric Learning for Face Verification in the Wild . In: CVPR; 2014 . p. 1875 ± 1882 . 31. Han X , Leung T , Jia Y , Sukthankar R , Berg AC . MatchNet: Unifying feature and metric learning for patch-based matching . In: CVPR; 2015 . p. 3279 ± 3286 . 32. Noh YK , Zhang BT , Lee DD . Generative local metric learning for nearest neighbor classification . In: NIPS; 2010 . p. 1822 ± 1830 . 33. Shi Y , Bellet A , Sha F. Sparse Compositional Metric Learning . In: AAAI; 2014 . p. 2078 ± 2084 . 34. Cayley A. A sixth memoir upon quantics . Philosophical Transactions of the Royal Society of London . 1859 ; 149 : 61 ± 90 . 1859 .0004 35. Klein F. UÈ ber die sogenannte Nicht-euklidische Geometrie . Mathematische Annalen . 1871 ; 4 : 573 ± 625 . 36. Klein F. UÈ ber die sogenannte Nicht-euklidische Geometrie (Zweiter Aufsatz) . Mathematische Annalen . 1873 ; 6 : 112 ± 145 . 37. Klein F. Vorlesungen uÈber Nicht-euklidische Geometrie . Julius Springer; 1928 . Borsuk K , Szmielew W. Foundations of Geometry. 1st ed. North Holland, Amsterdam; 1960 . Coxeter HSM. Non-Euclidean Geometry . University of Toronto Press; 1957 . 40. van der Maaten LJP , Hinton GE . Visualizing High-Dimensional Data Using t-SNE . Journal of Machine Learning Research . 2008 ; 9 : 2579 ± 2605 . 41. Kumar N , Berg A , Belhumeur P , Nayar S. Attribute and simile classifiers for face verification . In: ICCV; 2009 . p. 365 ± 372 . 42. Parikh D , Grauman K. Relative attributes . In: ICCV; 2011 . p. 503 ± 510 . 43. Oliva A , Torralba A . Modeling the shape of the scene: A holistic representation of the spatial envelope . International Journal of Computer Vision . 2001 ; 42 ( 3 ): 145 ± 175 . A:1011139631724

This is a preview of a remote PDF:

Yanhong Bi, Bin Fan, Fuchao Wu. Multiple Cayley-Klein metric learning, PLOS ONE, 2017, DOI: 10.1371/journal.pone.0184865