A Survey of Support Vector Machines with Uncertainties (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs40745-014-0022-8.pdf

A Survey of Support Vector Machines with Uncertainties

Ann. Data. Sci. (2014) 1(3–4):293–309 DOI 10.1007/s40745-014-0022-8 A Survey of Support Vector Machines with Uncertainties Ximing Wang · Panos M. Pardalos Received: 10 October 2014 / Revised: 1 November 2014 / Accepted: 10 December 2014 / Published online: 17 January 2015 © Springer-Verlag Berlin Heidelberg 2015 Abstract Support Vector Machines (SVM) is one of the well known supervised classes of learning algorithms. SVM have wide applications to many fields in recent years and also many algorithmic and modeling variations. Basic SVM models are dealing with the situation where the exact values of the data points are known. This paper presents a survey of SVM when the data points are uncertain. When a direct model cannot guarantee a generally good performance on the uncertainty set, robust optimization is introduced to deal with the worst case scenario and still guarantee an optimal performance. The data uncertainty could be an additive noise which is bounded by norm, where some efficient linear programming models are presented under certain conditions; or could be intervals with support and extremum values; or a more general case of polyhedral uncertainties with formulations presented. Another field of the uncertainty analysis is chance constrained SVM which is used to ensure the small probability of misclassification for the uncertain data. The multivariate Chebyshev inequality and Bernstein bounding schemes have been used to transform the chance constraints through robust optimization. The Chebyshev based model employs moment information of the uncertain training points. The Bernstein bounds can be less conservative than the Chebyshev bounds since it employs both support and moment information, but it also makes a strong assumption that all the elements in the data set are independent. Keywords Support vector machines · Robust optimization · Bounded norm · Polyhedral uncertainties · Chance constraints X. Wang (B) · P. M. Pardalos Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA e-mail: P. M. Pardalos e-mail: 123 294 Ann. Data. Sci. (2014) 1(3–4):293–309 1 Introduction As one of the well known supervised learning algorithms, Support Vector Machines (SVM) are gaining more and more attention. It was proposed by Vapnik [1,2] as a maximum-margin classifier, and tutorials on SVM could refer to [3–6]. In recent years, SVM have been applied to many fields and have many algorithmic and modeling variations. In the biomedical field, SVM have been used to identify physical diseases [7–10] as well as psychological diseases [11]. Electroencephalography (EEG) signals can also be analyzed using SVM [12–14]. Besides these, SVM also applied to protein prediction [15–19] and medical images [20–22]. Computer vision includes many applications of SVM like person identification [23], hand gesture detection [24], face recognition [25] and background subtraction [26]. In geosceinces, SVM have been applied to remote sensing analysis [27–29], land cover change [30–32], landslide susceptibility [33–36] and hydrology [37,38]. In power systems, SVM was used for transient status prediction [39], power load forecasting [40], electricity consumption prediction [41] and wind power forecasting [42]. Stock price forecasting [43–45] and business administration [46] can also use SVM. Other applications of SVM include agriculture plant disease detection [47], condition monitoring [48], network security [49] and electronics [50,51]. When basic SVM models cannot satisfy the application requirement, different modeling variations of SVM can be found in [52]. In this paper, a survey of SVM with uncertainties is presented. Basic SVM models are dealing with the situation that the exact values of the data points are known. When the data points are uncertain, different models have been proposed to formulate the SVM with uncertainties. Bi and Zhang [53] assumed the data points are subject to an additive noise which is bounded by the norm and proposed a very direct model. However, this model cannot guarantee a generally good performance on the uncertainty set. To guarantee an optimal performance when the worst case scenario constraints are still satisfied, robust optimization is utilized. Trafalis et al. [54–58] proposed a robust optimization model when the perturbation of the uncertain data is bounded by norm. Ghaoui et al. [59] derived a robust model when the uncertainty is expressed as intervals. Fan et al. [60] studied a more general case for polyhedral uncertainties. Robust optimization is also used when the constraint is a chance constraint which is to ensure the small probability of misclassification for the uncertain data. The chance constraints are transformed by different bounding inequalities, for example multivariate Chebyshev inequality [61,62] and Bernstein bounding schemes [63]. The organization of this paper is as follows: Sect. 2 gives an introduction to the basic SVM models. Section 3 presents the SVM with uncertainties, stating both the robust SVM with bounded uncertainty and chance constrained SVM through robust optimization. Section 4 presents concluding remarks and suggesting for further research. 2 Basic SVM Models Support Vector Machines construct maximum-margin classifiers, such that small perturbations in data are least likely to cause misclassification. Empirically, SVM works really well and are well known supervised learning algorithms proposed by Vap- 123 Ann. Data. Sci. (2014) 1(3–4):293–309 295 m nik [1,2]. Suppose we have a two-class dataset of m data points {xi , yi }i=1 with n n-dimensional features xi ∈ R and respective class labels yi ∈ {+1, −1}. For linearly separable datasets, there exists a hyperplane w x + b = 0 to separate the two classes and the corresponding classification rule is based on the sign(w x + b). If this value is positive, x is classified to be in +1 class; otherwise, −1 class. The datapoints that the margin pushes up against are called support vectors. A maximum-margin hyperplane is one that maximizes the distance between the hyperplane and the support vectors. For the separating hyperplane w x + b = 0, w and b could be normalized so that w x + b = +1 goes through support vectors of +1 class, and w x + b = −1 goes through support vectors of −1 class. The distance between 2 these two hyperplane, i.e., the margin width, is w 2 , therefore, maximization of the 2 margin can be performed as minimization of 21 w22 subject to separation constraints. This can be expressed as the following quadratic optimization problem: 1 min w22 w,b 2 (1a) s.t. yi (w xi + b) ≥ 1, i = 1, . . . , m (1b) Introduing Lagrange multipliers α = [α1 , . . . , αm ], the above constrained problem can be expressed as: min max L (w, b, α) = w,b α≥0 m 1 w22 − αi yi (w xi + b) − 1 2 (2) i=1 Take the derivatives with respect to w and b, and set to zero: ∂L (w, b, α) =0 ⇒ w= αi yi xi ∂w (3a) m ∂L ( (...truncated)