One-class classifiers with incremental learning and forgetting for data streams with concept drift
Bartosz Krawczyk
0
Micha Wozniak
0
0
This work was supported by the Polish National Science Center under the Grant No. DEC-2013/09/B/ST6/02264
One of the most important challenges for machine learning community is to develop efficient classifiers which are able to cope with data streams, especially with the presence of the so-called concept drift. This phenomenon is responsible for the change of classification task characteristics, and poses a challenge for the learning model to adapt itself to the current state of the environment. So there is a strong belief that one-class classification is a promising research direction for data stream analysisit can be used for binary classification without an access to counterexamples, decomposing a multi-class data stream, outlier detection or novel class recognition. This paper reports a novel modification of weighted one-class support vector machine, adapted to the non-stationary streaming data analysis. Our proposition can deal with the gradual concept drift, as the introduced one-class classifier model can adapt its decision boundary to new, incoming data and additionally employs a forgetting mechanism which boosts the ability of the classifier to follow the model changes. In this work, we propose several different strategies for incremental learning and forgetting, and additionally we evaluate them on the basis of several real data streams. Obtained results confirmed the usability of proposed classifier to the problem of data stream classification with the presence of concept drift. Additionally, implemented forgetting mechanism assures the limited memory consumption, because only quite new and valuable examples should be memorized.
-
Contemporary computer systems manage and store
enormous amount of data. It is predicted that the volume of stored
information will be doubling every two years. People send 14
billion e-mails and more than 350 million tweets per day. The
huge chains of discount department stores (as Walmarkt Inc.)
register more than 1 million transactions per hour.
Therefore, the marked leading companies desire to develop smart
analytic tools based on machine learning approach, which
can analyze such enormous amount of data. Additionally,
designing such analytical tools should take into a
consideration that most of data arrives continuously in the form of
so-called data stream (Gama 2010). Furthermore, the
relation within the data, i.e., statistical dependencies
characterizing a given phenomenon (such as client behavior), may
change (Gama 2012). This observation requires a special
analytical model which can cope with such non-stationary
characteristics. In the beginning, the data streams originated
in the financial markets. Today, data streams can be found
everywherein the Internet, monitoring systems, sensor
networks and other domains (Hulten et al. 2009). Data streams
differ from the traditional static data, because they can be
viewed as an infinite amount of data that arrives continuously,
where memory and computational complexity play the
crucial roles. Due to this mining data stream poses many new
challenges to the contemporary machine learning systems
(Aggarwal et al. 2004).
In this paper, we will mainly focus on the classification
task, which is a widely used analytical approach (Duda et al.
2001). Basically, it aims at assigning a given observation to
one of the predefined categories. Such situation can be found,
e.g., in spam filtering, biometrics, medical decision support,
or fraud detection to enumerate only a few. The concept drift
in the classification model mean that the statistical
dependencies between attributes describing an object and its predefined
label could change over time.
To explain the possible types of the changes, let us
shortly introduce a statistical classification model. This
theory assumes (Duda et al. 2001) that both the attributes
describing an object x X Rd and its correct
classification (class label) j M = {1, 2, . . . , M } are observed
values of a pair of random variables (X, J). The probability
distribution of them is given by the prior class probabilities
p j = P(J = j ), j M
and class-conditional probability density function of X
virtual concept drift means that changes do not impact the
posterior probabilities, but affect the conditional
probability density functions (Widmer and Kubat 1993).
real concept drift means that changes affect the posterior
probabilities and may impact unconditional probability
density function (Schlimmer and Granger 1986; Widmer
and Kubat 1996).
From the classification point of view, the real concept drift is
important because it can strongly affect the shape of the
decision boundary. The virtual drift does not affect the decision
rule, especially taking into consideration the Bayes decision
rule Eq. (4). Another drift taxonomy depends on the drift
impetuosity and here we can distinguish:
slow changes, i.e., gradual or incremental drift.
abrupt changes, i.e., sudden drift.
The presence of a concept drift can lead to serious
deterioration of classifiers accuracy (Lughofer and Angelov 2011).
This is depicted in Fig. 1, where two types of concept drift
are shown.
Additionally, we can consider a reoccurring concept drift.
It may occur in cases of, e.g., seasonal phenomena as weather
prediction or client preferences of clothes or sport stores
(Widmer and Kubat 1996). Therefore, developing efficient
methods which are able to deal with this type of change in
data stream is nowadays the focus of intense research.
The main aim of this paper is to introduce an efficient
method of incremental data stream classification, i.e., we will
consider the task where the concept drift is rather smooth and
the classifier model will try to follow the models changes.
As implementation of the classifier, we propose a novel
modification of the weighted one-class classifier which tunes
the shape of its decision boundary on the basis of weights
assigned to the training objects. The main contribution of this
work is a novel one-class classifier applied to this task with
the built-in adaptation and forgetting mechanisms. The
evaluation of the proposed method is carried out on the basis of the
computer experiments on real and semi-synthetic data sets.
The outline of the work is as follows. First, the related works
on data stream classification and one-class classifiers will
be presented. Then, the original algorithm will be described.
The following section is focusing on the results of the
experimental research. The last part concludes the paper.
2 Related works In this section, a short overview on the related fields will be given.
f j (x ) = f (x | j ), x X , j M.
The classification algorithm
to the set of defined class labels M
If the probability characteristics given by Eqs. (1) and (2)
are known, then the optimal classifier , minimizing the
misclassification probability, makes decisions according to
the following rule:1
(x ) = i if pi (x ) = kmaM (...truncated)