Least Square Support Tensor Regression Machine Based on Submatrix of the Tensor
Least Square Support Tensor Regression Machine Based on Submatrix of the Tensor
Tuo Shu and Zhi-Xia Yang
College of Mathematics and Systems Science, Xinjiang University, Urumqi 830046, China
Correspondence should be addressed to Zhi-Xia Yang; moc.anis@xhzgnayjx
Received 14 March 2017; Revised 10 October 2017; Accepted 15 October 2017; Published 9 November 2017
Academic Editor: Gisella Tomasini
Copyright © 2017 Tuo Shu and Zhi-Xia Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
For tensor regression problem, a novel method, called least square support tensor regression machine based on submatrix of a tensor (LS-STRM-SMT), is proposed. LS-STRM-SMT is a method which can be applied to deal with tensor regression problem more efficiently. First, we develop least square support matrix regression machine (LS-SMRM) and propose a fixed point algorithm to solve it. And then LS-STRM-SMT for tensor data is proposed. Inspired by the relation between photochrome and the gray pictures, we reformulate the tensor sample training set and form the new model (LS-STRM-SMT) for tensor regression problem. With the introduction of projection matrices and another fixed point algorithm, we turn the LS-STRM-SMT model into several related LS-SMRM models which are solved by the algorithm for LS-SMRM. Since the fixed point algorithm is used twice while solving the LS-STRM-SMT problem, we call the algorithm dual fixed point algorithm (DFPA). Our method (LS-STRM-SMT) has been compared with several typical support tensor regression machines (STRMs). From theoretical point of view, our algorithm has less parameters and its computational complexity should be lower, especially when the rank of submatrix is small. The numerical experiments indicate that our algorithm has a better performance.
As we all know, in the past decades, matrices or more generally multiway arrays (tensors) types of data have an increasing number of applications. For example, all raster images are essentially digital readings of a grid of sensors and matrix analysis is widely applied in image processing, for example, photo realistic images of faces , palms , and medical images . In web search, a large amount of tensors that stand for images  can be found easily. Therefore, tensor data analysis , particularly regression problem [6, 7], has become one of the most important topics for face recognition , palmprint recognition , and so on.
Tensor types of data have greatly drawn the attention of people. Recently, several tensor learning for regression approaches [10, 11] appears, but the majority of them dealing with tensor regression problems work on vector spaces that are derived by stacking the original tensor elements in a more or less arbitrary order. This vectorization of data causes many new problems. First, the structural information is destroyed. Second, the vectorization of a tensor may bring an extremely high dimensionality vector which may lead to high computational complexity, overfitting, and large memory requirement. The rest of the methods mainly take advantage of the decomposition of a matrix  or tensor , which can reduce the high computational complexity as well as high dimensionality at the expense of slight decline of accuracy, but the structural information is destroyed totally. So a more reasonable method which can reserve the underlying structural information and avoid overfitting, high dimensionality, and high computational complexity is needed.
Considering the fact that a colorful photograph can be expressed as a third-order tensor, of which each frontal slice is indeed a gray image that contains almost all the information of the colorful photograph, we can take advantage of this by introducing submatrix of a tensor and abstract vector space when solving tensor learning for regression problems. That means, each tensor data sample can be regarded as an abstract vector  whose elements are submatrix types of features. Gathering together the same feature information of different tensor data sample, we can construct new submatrix training sets and the same number of related training models, from which we can get an equal amount of weight submatrix. And then, the weight tensor is obtained. Besides, we improve the fixed point algorithm  via some projection matrices , including a series of left projection matrices and a right projection matrix. The improved algorithm is called dual fixed point algorithm (DFPA). The projection matrices can not only join the training models up but also reduce computational complexity and large memory requirement. That is to say, we turn the LS-STRM-SMT problems into solving a battery of least square support matrix machine (LS-SMRM) by fixing the projection matrices and then solve the LS-SMRMs problems with fixed point algorithm. The numerical experiments indicate that our method and algorithm have a better performance.
The paper is organized as follows: in Section 2, notations and preliminaries are introduced, such as definitions related to tensors and notation that will be used. In Section 3, we propose our (LS-SMRM) for matrix regression problems and the fixed point algorithm for them. In Section 4, we propose the LS-STRM-SMT models and develop the DFPA to solve them. Computational comparisons on both UCI data sets and artificial data are done in Section 5 and conclusions in Section 6.
2. Notations and Preliminaries
Here, we will give a brief description of the notations that will be used in the later sections. More specifically, boldface capital letters, for example, , boldface lowercase letters, for example, , and lowercase letters, for example, , are used to denoted matrices, vectors, and scalars, respectively. Tensors regarded as multidimensional arrays will be denoted by Euler script calligraphic letters, for example, , where denotes the order of the tensor. Inspired by the fact that th element of a vector is denoted by , the elements of an -order tensor will be denoted by .
For an -order tensor , the -mode matricization also known as unfolding or flattening is denoted byIt is quite clear that we can reorder the elements of the tensor into a matrix in such a way. On the contrary, define a mapping functionto recover a tensor from its unfolding matrix. Particularly, when , we haveThe inner product of the two same size tensors , is defined asThe Frobenius norm of a tensor is thus defined as And it can be shown that The Contrast Pyramid, referred to as CP, decomposition factorizes an order tensor into a linear combination of rank-one tensors, written aswhere the operator is the outer product of vectors and the factor matrix , of the size . For the convenience, the mentioned tensor is the third-order tensor in the following content if there is no special instruction.
3. Least Square Support Matrix Regression Machine (LS-SMRM)
In the section, we propose least square support matrix regression machine, shorten as LS-SMRM, for the regression problem with matrix input.
Give a training setwhere is the input and is the output, . Our task is to find a predictorwhere denotes the weight matrix and is the bias. For a new input matrix, we can predict its output through the above-mentioned predictor.
In order to get predictor (9), we develop the following optimization problem:where is penalty parameter.
According to the CP decomposition (7), the matrices and can be found to makewhere , and . Then, optimization problem (10) can be turned into as follows:The fixed point algorithm is applied to solve optimization problem (12). When is fixed, we need to compute a set of for and . Firstly, denoteand the optimization problems (12) are equivalent toAnd letoptimization problems (14) are reformulated asThe Lagrange function of optimization problems (16) can be expressed aswhere is Lagrangian multiplier vector. Then, the KKT system of (17) isRewrite (18), (19), and (20) into the form of equation; we can getwhere and can be got by solving linear system (21). Then, is obtained according to (18) and the right projection matrix is accessed through the relation among , (13) and (15). In summary, when we fix , the solution of optimization problems (12) can be computed by solving linear system (21) directly.
Similarly, when is fixed, we can also change the formulation of our algorithm in optimization problems (12) and derive its optimal and by solving another linear system. That is to say, we need to compute a set of for and . Firstly, denoteoptimization problems (12) is equivalent toAnd letthe optimization problems (24) are reformulated asThe Lagrange function of optimization problems (26) can be expressed aswhere is Lagrangian multiplier vector. Then, the KKT system of (27) isRewrite (28), (29), and (30) into the form of equation; we can getwhereRepeating the iterative operation until convergence, the weight matrix W is obtained by (11), and the predictor is According to the above description, we can summarize the following algorithm.
Algorithm 1 (LS-SMRM).
(1) Training Process
Input. Training set (8).
Output. Left and right projection matrices , and bias . (a)Initialize .(b)Formulate by (13).(c)Alternatively update and until convergence.(1)Update and .(1.1)Get and by solving linear system (21).(1.2)Get from (19).(1.3)Get from (15).(1.4)Get from (13).(2)Update and .(2.1)Get and by solving linear system (31).(2.2)Get from (28).(2.3)Get from (25).(2.4)Get from (23).(2) Testing Process
Input. Testing point , left and right projection matrices , and bias .
Output. The value of testing set:
4. LS-STRM Based on Submatrix of the Tensor (LS-STRM-SMT)
In this section, we propose least square support tensor regression machine, shorten as LS-STRM, for the regression problem with the tensor input. In fact, with the introduction of submatrix of the tensor, the LS-STRM problem is turned into LS-SMRM problems which are independent. However, the right project matrices should be made equal to fit the practical situation. That is to say, we need to solve LS-SMRM problems with the same right projection matrix. To show the effectiveness of the proposed algorithm, we will provide some deep analyses in the section.
Give a training setwhere is the input and is the output, . Our task is to find a predictorwhere denotes the weight tensor and is the bias. For a new input tensor, we can predict its output through predictor (36). For convenience, we set .
Definition 2. For an -order tensor , the submatrix of the tensor is defined asParticularly, when , the submatrices of a third-order tensor are indeed the frontal slices.
However, we do not construct the model for training set (35) directly. We transform the training set (35) into training sets similar to training set (8) by the introduction of the submatrix of the tensor and then construct regression problems. That means, for the training set (35), every input tensor , can be regarded as an abstract vectorwhere , is the th frontal slice of the tensor . Next, we take the th frontal slice of each tensor , out and construct the following training set:where denotes the th element of , is a vector obey normal distribution with mean and variance .
According to training set (39), optimization problems are constructed as follows:where , So we can get , and then the weight tensor can be obtained by It is clear that the model is independent, but this is contrary to the truth. In order to reflect the relation among them, we set so that the models can fit the practical situation better. The models can be expressed together as follows:where , That means what we need to solve are optimization problems (43).
4.2. Dual Fixed Point Algorithm (DFPA) for LS-STRM-SMT
Fixing and optimizing , optimization problems (43) can be reformulated as follows:where .
That is to say, a series of problems similar to optimization problems (44) which are indeed LS-SMRMs rather than optimization problems (43) are needed to be solved.
Fixing and optimizing . Similarly, when , is fixed, optimization problems (43) can be reformulated as follows:where It is clear that can be obtained by solving optimization problems (45)-(46) with Algorithm 1. This will lead Algorithm 3.
Algorithm 3 (LS-STRM-SMT).
(1) Training Process
Input. Training set (35).
Output. , and . (a)Construct the number of new submatrix training sets (39).(b)Give a positive integer .(c)Initialize and .(d)Alternatively update and until convergence.(1)Update , by solving optimization problems (44) with Algorithm 1.(2)Update by solving optimization problems (45)-(46) with Algorithm 1.(2) Testing Process
Input. Testing point , , and
Output. The value of testing point:
For a more general tensor, that means . We can also take advantage of the submatrix of the tensor to turn the tensor learning for regression problems into LS-SMRM problems that can be solved by Algorithm 1. The details can be shown as follows.
Give a training setwhere is the input and is the output, . Our task is to find a predictorwhere denotes the weight tensor, is the bias. For a new input tensor, we can predict its output through predictor (50). The new training set is constructed through (1) as follows:where denotes the th submatrix of the th sample, denotes the th element of which is a -dimension vector obey normal distribution with mean and variance . Then we can get a -dimension abstract vector whose elements are matrices of the size . According to mapping function (2), the matrices can be reformulated as equal number of ()-order tensors. And then the abstract vector with ()-order tensor elements is indeed a -order tensor that we need to get.
5. Numerical Experiment
In the following numerical experiments, we use four groups of vector data from the UCI database and an artificial tensor data for evaluation of our algorithm. The data are Slump, Ticdata2000, ConcreteData, and BlogData from the UCI database. The artificial data is given by the function “rand” in Matlab. We reformulate the vector data into matrices or tensors by rearranging the order of vector’s elements. The detailed statistical characters are listed in Table 1.
Table 1: Characters of different data sets.
Since our method is solved in a fixed point algorithm or in other words an alternative way, the final solution has close relationship with initialization. We would like to introduce two different kinds of initialization strategies in this part. The first initialization is random. We generate some random elements to form the matrices with corresponding sizes. In the following experiments, this kind of initialization is used without specification. The second initialization is an empirical method. That means we can set .
When solving the LS-STRM-SMT, penalty parameter is selected from the set . Another important parameter that may influence the result is . Experiment performances show that the mean square error (MSE) which we used to measure whether the solutions to the LS-STRM-SMT problems are good or not decrease as the growth of the value. That means the larger is, the smaller MSEs are. But, this phenomenon is not very obvious and when increases the computational complexity will rise quickly. We will give the MSE of different for two data sets. And in the other experiments, we simply set . Besides, the parameter can be obtained by fivefold cross validation. In this section, we compute the tensor learning regression problems for third-order data with our LS-STRM-SMT algorithm and compare it with some other regression method, three problems are talked about. First, how does the MSE for different data change with the increasing of ? Second, compare our algorithm with support vector regression (SVR) , least square support vector regression (LSSVR) , regularized matrix regression RMR, , and lasso regression [19, 20]. The last problem which is being talked about is the convergence of our algorithm.
The information of the data used is listed in Table 1. Tables 2 and 3 show the influence of different for artificial data and Ticdata2000, respectively, and the comparison of our algorithm and others are in Table 4. From Tables 2 and 3, we know that, with the increasing of , the MSE tends to reduced gradually. But that is not very obvious and the largest does not always means the best performance. In Figure 2, (a) and (b) show the relation between and MSE more intuitively. From (a) and (b) in Figure 3 which is the average MSE of different for artificial data and Ticdata2000, respectively. We can easily come to the conclusion that when become large, the change of MSE is not that big, so we can set to reduce computation complexity.
Table 2: MSE of different for Ticdata2000.
Table 3: MSE of different for artificial data.
Table 4: MSE of different algorithm for different data set.
In Table 4, the variance is represented by the number in parentheses while numbers out of the parentheses denote the MSE of different algorithm for different data set. Since the vector sample can be reformulated as matrix (tensor) of different size, numbers with italic font mean the average value of different data size. The numbers which are bold denote the best performance. It is clear that our LS-STRM-SMT always has a better performance which is obvious in Figure 4. More details are included in Tables 5, 6, and 7.
Table 5: RMR for Slump and ConcreteData of different data size, respectively.
Table 6: RMR for artificial data, Ticdata2000, and BlogData of different data size, respectively.
Table 7: LS-STRM-SMT for BlogData of different data size.
In order to show the convergence behavior, we computerize the objective value of our method for artificial data and Ticdata2000. After 20 times iterations, the objective function values are shown in Figure 1. There are mainly two observations from the result. First, the objective values of our algorithm are extremely small, which implies that our algorithm performs good to some degree. Second, with the increasing of iteration, our algorithm tends to be a certain number which guarantee the convergence of our algorithm.
Figure 1: Convergence of LS-STRM-BST.
Figure 2: MSE of different for artificial data and Ticdata2000.
Figure 3: MSE of different for artificial data and Ticdata2000.
Figure 4: Comparison of our algorithm and the other algorithm that performs best.
In Figure 1, the abscissa and the ordinate represent the iterations and the objective value, respectively, and the curve with different color and mark in the same figure stands for the relationship between objective value and iterations for the same data of different data size.
In Figure 2, the abscissa and the ordinate represent the parameter and the MSE, respectively, and the line graph with different color and mark in the same fig stands for the relationship between the parameter and the MSE for the same data of different data size.
Figure 3 show the relationship between average MSE and for the artificial data and Ticdata2000, respectively.
Figure 4 is plotted based on Tables 2, 3, and 4. In (a) the full line stands for the performance of our algorithm while the imaginary line stands for the best performance of other algorithm. In (b) the bar graph in blue represents the performance of our algorithm while the bar graph in brown represents the best performance of other algorithm. From Figure 4, we can easily come to the conclusion that our method has a better performance.
In this paper, we propose a novel method for tensor learning regression problems, called least square support tensor regression machine based on submatrix of the tensor (LS-STRM-SMT), which is inspired by the idea of multiple rank multilinear SVM for matrix data classification (MRMLSVM). And we give our dual fixed point algorithm (DFPA) to solve the problems. However, LS-STRM-SMT is not a straightforward extension of MRMLSVM. On one hand, MRMLSVM is a method for matrix data classification while LS-STRM-SMT is used to solve tensor data regression problems; on the other hand, the LS-STRM-SMT and MRMLSVM are connected by the introducing of submatrix of the tensor which can be applied to reformulate the LS-STRM-SMT problem as a series of LS-SMRM problems that can be figure out in a way similar to MRMLSVM. What is more, LS-STRM-SMT differs from traditional approach which destroys the structural information of the tensor totally; it can reserve the structure tensor information of the tensor to some extent which may be the reason why our method has a better performance than some other proposed algorithms. Besides, a small parameter can decrease the dimensions of the unknown quantity which can avoid overfitting to some degree. In addition, from a practical point of view, it has also been shown by the preliminary numerical experiments that the performance of our LS-STRM-SMT is better.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work is supported by the National Natural Science Foundation of China (no. 11561066).
References J. Yang, D. Zhang, and A. F. Frangi, “Two-dimensional PCA: a new approach to apperence -based face representation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131–137, 2004. View at Publisher · View at Google Scholar · View at ScopusA. Kong, D. Zhang, and M. Kamel, “A survey of palmprint recognition,” Pattern Recognition, vol. 42, no. 7, pp. 1408–1418, 2009. View at Publisher · View at Google Scholar · View at ScopusJ. J. Koo, A. C. Evans, and W. J. Gross, “3-D brain MRI tissue classification on FPGAs,” IEEE Transactions on Image Processing, vol. 18, no. 12, pp. 2735–2746, 2009. View at Publisher · View at Google Scholar · View at MathSciNetD. Le Bihan, J.-F. Mangin, C. Poupon et al., “Diffusion tensor imaging: concepts and applications,” Journal of Magnetic Resonance Imaging, vol. 13, no. 4, pp. 534–546, 2001. View at Publisher · View at Google Scholar · View at ScopusR. Abraham, J. E. Marsden, and R. T. Manifolds, Manifolds, tensor analysis, and applications, vol. 2 of Global Analysis Pure and Applied: Series B, Addison-Wesley Publishing Co., Reading, Mass., 1983. View at MathSciNetW. Guo, I. Kotsia, and I. Patras, “Tensor learning for regression,” IEEE Transactions on Image Processing, vol. 21, no. 2, pp. 816–827, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusZ. Hua, L. Lexin, and Z. Hongtu, “Tensor Regression with Applications in Neuroimaging Data Analysis,” Journal of the American Statistical Association, vol. 108, no. 502, pp. 540–552, 2013. View at Google ScholarJ. Wright, A. Yang Y, and A. Ganesh, “Robust Face Recognition via Sparse Representation,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2008. View at Google ScholarG. Lu, D. Zhang, and K. Wang, “Palmprint recognition using eigenpalms features,” Pattern Recognition Letters, vol. 24, no. 9-10, pp. 1463–1467, 2003. View at Publisher · View at Google Scholar · View at ScopusD. Cai, X. He, and J. Wen R, “Support tensor machines for text categorization,” International Journal of Academic Research in Business & Social Sciences, vol. 2, no. 12, pp. 2222–6990, 2006. View at Google ScholarZ. Hao, L. He, B. Chen, and X. Yang, “A linear support higher-order tensor machine for classification,” IEEE Transactions on Image Processing, vol. 22, no. 7, pp. 2911–2920, 2013. View at Publisher · View at Google Scholar · View at ScopusC. Hou, F. Nie, C. Zhang, D. Yi, and Y. Wu, “Multiple rank multi-linear SVM for matrix data classification,” Pattern Recognition, vol. 47, no. 1, pp. 454–469, 2014. View at Publisher · View at Google Scholar · View at ScopusM. Signoretto, Q. Tran Dinh, L. De Lathauwer, and J. A. Suykens, “Learning with tensors: a framework based on convex optimization and spectral regularization,” Machine Learning, vol. 94, no. 3, pp. 303–351, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusK. Goebel and W. A. Kirk, “A fixed point theorem for asymptotically nonexpansive mappings,” Proceedings of the American Mathematical Society, vol. 35, pp. 171–174, 1972. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusH. Pirsiavash, D. Ramanan, and C. Fowlkes, “Bilinear classifiers for visual recognition,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009, pp. 1482–1490, December 2009. View at ScopusA. J. Smola and B. Lkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004. View at Publisher · View at Google Scholar · View at MathSciNetJ. A. K. Suykens, L. Lukas, P. Van et al., Least Squares Support Vector Machine Classifiers: a Large Scale Algorithm, 2000. H. Zhou and L. Li, “Regularized matrix regression,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 76, no. 2, pp. 463–483, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusR. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 73, no. 3, pp. 273–282, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusS. Boyd, Alternating Direction Method of Multipliers, 2011.