One-class classifiers with incremental learning and forgetting for data streams with concept drift (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs00500-014-1492-5.pdf

One-class classifiers with incremental learning and forgetting for data streams with concept drift

Bartosz Krawczyk 0 Micha Wozniak 0 0 This work was supported by the Polish National Science Center under the Grant No. DEC-2013/09/B/ST6/02264 One of the most important challenges for machine learning community is to develop efficient classifiers which are able to cope with data streams, especially with the presence of the so-called concept drift. This phenomenon is responsible for the change of classification task characteristics, and poses a challenge for the learning model to adapt itself to the current state of the environment. So there is a strong belief that one-class classification is a promising research direction for data stream analysisit can be used for binary classification without an access to counterexamples, decomposing a multi-class data stream, outlier detection or novel class recognition. This paper reports a novel modification of weighted one-class support vector machine, adapted to the non-stationary streaming data analysis. Our proposition can deal with the gradual concept drift, as the introduced one-class classifier model can adapt its decision boundary to new, incoming data and additionally employs a forgetting mechanism which boosts the ability of the classifier to follow the model changes. In this work, we propose several different strategies for incremental learning and forgetting, and additionally we evaluate them on the basis of several real data streams. Obtained results confirmed the usability of proposed classifier to the problem of data stream classification with the presence of concept drift. Additionally, implemented forgetting mechanism assures the limited memory consumption, because only quite new and valuable examples should be memorized. - Contemporary computer systems manage and store enormous amount of data. It is predicted that the volume of stored information will be doubling every two years. People send 14 billion e-mails and more than 350 million tweets per day. The huge chains of discount department stores (as Walmarkt Inc.) register more than 1 million transactions per hour. Therefore, the marked leading companies desire to develop smart analytic tools based on machine learning approach, which can analyze such enormous amount of data. Additionally, designing such analytical tools should take into a consideration that most of data arrives continuously in the form of so-called data stream (Gama 2010). Furthermore, the relation within the data, i.e., statistical dependencies characterizing a given phenomenon (such as client behavior), may change (Gama 2012). This observation requires a special analytical model which can cope with such non-stationary characteristics. In the beginning, the data streams originated in the financial markets. Today, data streams can be found everywherein the Internet, monitoring systems, sensor networks and other domains (Hulten et al. 2009). Data streams differ from the traditional static data, because they can be viewed as an infinite amount of data that arrives continuously, where memory and computational complexity play the crucial roles. Due to this mining data stream poses many new challenges to the contemporary machine learning systems (Aggarwal et al. 2004). In this paper, we will mainly focus on the classification task, which is a widely used analytical approach (Duda et al. 2001). Basically, it aims at assigning a given observation to one of the predefined categories. Such situation can be found, e.g., in spam filtering, biometrics, medical decision support, or fraud detection to enumerate only a few. The concept drift in the classification model mean that the statistical dependencies between attributes describing an object and its predefined label could change over time. To explain the possible types of the changes, let us shortly introduce a statistical classification model. This theory assumes (Duda et al. 2001) that both the attributes describing an object x X Rd and its correct classification (class label) j M = {1, 2, . . . , M } are observed values of a pair of random variables (X, J). The probability distribution of them is given by the prior class probabilities p j = P(J = j ), j M and class-conditional probability density function of X virtual concept drift means that changes do not impact the posterior probabilities, but affect the conditional probability density functions (Widmer and Kubat 1993). real concept drift means that changes affect the posterior probabilities and may impact unconditional probability density function (Schlimmer and Granger 1986; Widmer and Kubat 1996). From the classification point of view, the real concept drift is important because it can strongly affect the shape of the decision boundary. The virtual drift does not affect the decision rule, especially taking into consideration the Bayes decision rule Eq. (4). Another drift taxonomy depends on the drift impetuosity and here we can distinguish: slow changes, i.e., gradual or incremental drift. abrupt changes, i.e., sudden drift. The presence of a concept drift can lead to serious deterioration of classifiers accuracy (Lughofer and Angelov 2011). This is depicted in Fig. 1, where two types of concept drift are shown. Additionally, we can consider a reoccurring concept drift. It may occur in cases of, e.g., seasonal phenomena as weather prediction or client preferences of clothes or sport stores (Widmer and Kubat 1996). Therefore, developing efficient methods which are able to deal with this type of change in data stream is nowadays the focus of intense research. The main aim of this paper is to introduce an efficient method of incremental data stream classification, i.e., we will consider the task where the concept drift is rather smooth and the classifier model will try to follow the models changes. As implementation of the classifier, we propose a novel modification of the weighted one-class classifier which tunes the shape of its decision boundary on the basis of weights assigned to the training objects. The main contribution of this work is a novel one-class classifier applied to this task with the built-in adaptation and forgetting mechanisms. The evaluation of the proposed method is carried out on the basis of the computer experiments on real and semi-synthetic data sets. The outline of the work is as follows. First, the related works on data stream classification and one-class classifiers will be presented. Then, the original algorithm will be described. The following section is focusing on the results of the experimental research. The last part concludes the paper. 2 Related works In this section, a short overview on the related fields will be given. f j (x ) = f (x | j ), x X , j M. The classification algorithm to the set of defined class labels M If the probability characteristics given by Eqs. (1) and (2) are known, then the optimal classifier , minimizing the misclassification probability, makes decisions according to the following rule:1 (x ) = i if pi (x ) = kmaM (...truncated)