A Survey of Support Vector Machines with Uncertainties
Ann. Data. Sci. (2014) 1(3–4):293–309
DOI 10.1007/s40745-014-0022-8
A Survey of Support Vector Machines with
Uncertainties
Ximing Wang · Panos M. Pardalos
Received: 10 October 2014 / Revised: 1 November 2014 / Accepted: 10 December 2014 /
Published online: 17 January 2015
© Springer-Verlag Berlin Heidelberg 2015
Abstract Support Vector Machines (SVM) is one of the well known supervised
classes of learning algorithms. SVM have wide applications to many fields in recent
years and also many algorithmic and modeling variations. Basic SVM models are
dealing with the situation where the exact values of the data points are known. This
paper presents a survey of SVM when the data points are uncertain. When a direct
model cannot guarantee a generally good performance on the uncertainty set, robust
optimization is introduced to deal with the worst case scenario and still guarantee
an optimal performance. The data uncertainty could be an additive noise which is
bounded by norm, where some efficient linear programming models are presented
under certain conditions; or could be intervals with support and extremum values; or
a more general case of polyhedral uncertainties with formulations presented. Another
field of the uncertainty analysis is chance constrained SVM which is used to ensure the
small probability of misclassification for the uncertain data. The multivariate Chebyshev inequality and Bernstein bounding schemes have been used to transform the
chance constraints through robust optimization. The Chebyshev based model employs
moment information of the uncertain training points. The Bernstein bounds can be less
conservative than the Chebyshev bounds since it employs both support and moment
information, but it also makes a strong assumption that all the elements in the data set
are independent.
Keywords Support vector machines · Robust optimization · Bounded norm ·
Polyhedral uncertainties · Chance constraints
X. Wang (B) · P. M. Pardalos
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA
e-mail:
P. M. Pardalos
e-mail:
123
294
Ann. Data. Sci. (2014) 1(3–4):293–309
1 Introduction
As one of the well known supervised learning algorithms, Support Vector Machines
(SVM) are gaining more and more attention. It was proposed by Vapnik [1,2] as a
maximum-margin classifier, and tutorials on SVM could refer to [3–6]. In recent years,
SVM have been applied to many fields and have many algorithmic and modeling variations. In the biomedical field, SVM have been used to identify physical diseases
[7–10] as well as psychological diseases [11]. Electroencephalography (EEG) signals
can also be analyzed using SVM [12–14]. Besides these, SVM also applied to protein prediction [15–19] and medical images [20–22]. Computer vision includes many
applications of SVM like person identification [23], hand gesture detection [24], face
recognition [25] and background subtraction [26]. In geosceinces, SVM have been
applied to remote sensing analysis [27–29], land cover change [30–32], landslide
susceptibility [33–36] and hydrology [37,38]. In power systems, SVM was used for
transient status prediction [39], power load forecasting [40], electricity consumption
prediction [41] and wind power forecasting [42]. Stock price forecasting [43–45] and
business administration [46] can also use SVM. Other applications of SVM include
agriculture plant disease detection [47], condition monitoring [48], network security
[49] and electronics [50,51]. When basic SVM models cannot satisfy the application
requirement, different modeling variations of SVM can be found in [52].
In this paper, a survey of SVM with uncertainties is presented. Basic SVM models
are dealing with the situation that the exact values of the data points are known. When
the data points are uncertain, different models have been proposed to formulate the
SVM with uncertainties. Bi and Zhang [53] assumed the data points are subject to an
additive noise which is bounded by the norm and proposed a very direct model. However, this model cannot guarantee a generally good performance on the uncertainty
set. To guarantee an optimal performance when the worst case scenario constraints
are still satisfied, robust optimization is utilized. Trafalis et al. [54–58] proposed a
robust optimization model when the perturbation of the uncertain data is bounded by
norm. Ghaoui et al. [59] derived a robust model when the uncertainty is expressed
as intervals. Fan et al. [60] studied a more general case for polyhedral uncertainties.
Robust optimization is also used when the constraint is a chance constraint which is
to ensure the small probability of misclassification for the uncertain data. The chance
constraints are transformed by different bounding inequalities, for example multivariate Chebyshev inequality [61,62] and Bernstein bounding schemes [63].
The organization of this paper is as follows: Sect. 2 gives an introduction to the basic
SVM models. Section 3 presents the SVM with uncertainties, stating both the robust
SVM with bounded uncertainty and chance constrained SVM through robust optimization. Section 4 presents concluding remarks and suggesting for further research.
2 Basic SVM Models
Support Vector Machines construct maximum-margin classifiers, such that small perturbations in data are least likely to cause misclassification. Empirically, SVM works
really well and are well known supervised learning algorithms proposed by Vap-
123
Ann. Data. Sci. (2014) 1(3–4):293–309
295
m
nik [1,2]. Suppose we have a two-class dataset of m data points {xi , yi }i=1
with
n
n-dimensional features xi ∈ R and respective class labels yi ∈ {+1, −1}. For linearly separable datasets, there exists a hyperplane w x + b = 0 to separate the two
classes and the corresponding classification rule is based on the sign(w x + b). If this
value is positive, x is classified to be in +1 class; otherwise, −1 class.
The datapoints that the margin pushes up against are called support vectors. A
maximum-margin hyperplane is one that maximizes the distance between the hyperplane and the support vectors. For the separating hyperplane w x + b = 0, w and b
could be normalized so that w x + b = +1 goes through support vectors of +1 class,
and w x + b = −1 goes through support vectors of −1 class. The distance between
2
these two hyperplane, i.e., the margin width, is w
2 , therefore, maximization of the
2
margin can be performed as minimization of 21 w22 subject to separation constraints.
This can be expressed as the following quadratic optimization problem:
1
min w22
w,b 2
(1a)
s.t. yi (w xi + b) ≥ 1, i = 1, . . . , m
(1b)
Introduing Lagrange multipliers α = [α1 , . . . , αm ], the above constrained problem
can be expressed as:
min max L (w, b, α) =
w,b α≥0
m
1
w22 −
αi yi (w xi + b) − 1
2
(2)
i=1
Take the derivatives with respect to w and b, and set to zero:
∂L (w, b, α)
=0 ⇒ w=
αi yi xi
∂w
(3a)
m
∂L ( (...truncated)