Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors

Machine Learning, Aug 2016

Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs10994-016-5586-4.pdf

Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors

Mach Learn Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors Kai Ming Ting 0 1 Takashi Washio 0 1 Jonathan R. Wells 0 1 Sunil Aryal 0 1 B Jonathan R. Wells 0 1 Takashi Washio 0 1 0 The Institute of Scientific and Industrial Research, Osaka University , Ibaraki , Japan 1 School of Engineering and Information Technology, Federation University , Churchill , Australia Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as 'more data the better'. We call this 'the gravity of learning curve', and it is assumed that no learning algorithms are 'gravity-defiant'. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms. Editor: Joao Gama. Learning curve; Anomaly detection; Nearest neighbour; Computational geometry; AUC 1 Introduction In the machine learning context, learning curve describes the rate of task-specific performance improvement of a learning algorithm as the training set size increases. A typical learning curve of a learning algorithm is provided in Fig. 1. The error, as a measure of the learning algorithm’s performance, decreases at a fast rate when the training sets are small; and the r o r r E g n i t s e T Typical learning curve Gravity-defiant learning curve Number of training instances rate of decrease slows gradually until it reaches a plateau as the training sets increase to large sizes. Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve, though the actual rate of performance improvement may differ from one algorithm to another. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Recent research (Liu et al. 2008; Zhou et al. 2012; Sugiyama and Borgwardt 2013; Wells et al. 2014; Bandaragoda et al. 2014; Pang et al. 2015) has provided an indication that some algorithms may defy the gravity of learning curve, i.e., these algorithms can learn a better performing model using a small training set than that using a large training set. However, no concrete evidence of the ‘gravity-defiant’ behaviour is provided in the literature, let alone the reason why these algorithms behave this way. ‘Gravity-defiant’ algorithms have a key advantage of producing a good performing model using a training set significantly smaller than that required for ‘gravity compliant’ algorithms. They will yield significant saving on time and memory space that the conventional wisdom thought impossible. This paper focuses on nearest neighbour-based anomaly detectors because they have been shown to be one of the most effective class of anomaly detectors (Breunig et al. 2000; Sugiyama and Borgwardt 2013; Wells et al. 2014; Bandaragoda et al. 2014; Pang et al. 2015) . This paper makes the following contributions: 1. Provide a theoretical analysis of nearest neighbour-based anomaly detection algorithms which reveals that their behaviours defy the gravity of learning curve. This is the first analysis in machine learning research on learning curve behaviour that is based on computational geometry, as far as we know. 2. The theoretical analysis provides an insight into the behaviour of the nearest neighbour anomaly detector. In sharp contrast to the conventional wisdom: more data the better, the analysis reveals that sample size has three impacts which have not been considered by the conventional wisdom. First, increasing sample size increases the likelihood of anomaly contamination in the sample; and any inclusion of anomalies in the sample increases the false negative rate, thus, lowers the AUC. Second, the optimal sample size depends on the data distribution. As long as the data distribution is not sufficiently represented by the current sample, increasing the sample size will improve AUC. The optimal size is the number of instances best represents the geometry of normal instances and anomalies; this gives the optimal separation between normal instances and anomalies, encapsulated as the average nearest neighbour distance to anomalies. Third, increasing the sample size decreases the average nearest neighbour distance to anomalies. Increasing beyond the optimal sample size reduces the separation between normal instances and anomalies smaller than the optimal. This leads to the decreased AUC and gives rise to the gravitydefiant behaviour. 3. Present empirical evidence of the gravity-defiant behaviour using three nearest neighbourbased anomaly detectors in the unsupervised learning context. In addition, this paper uncovers two features of nearest neighbour anomaly detectors: A. Some nearest neighbour anomaly detector can achieve high detection accuracy with a significantly smaller sample size than others. B. A (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10994-016-5586-4.pdf

Kai Ming Ting, Takashi Washio, Jonathan R. Wells, Sunil Aryal. Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors, Machine Learning, 2017, pp. 55-91, Volume 106, Issue 1, DOI: 10.1007/s10994-016-5586-4