Classification of Multivariate Time Series and Structured Data Using Constructive Induction (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs10994-005-5826-5.pdf

Classification of Multivariate Time Series and Structured Data Using Constructive Induction

MOHAMMED WALEED KADOUS CLAUDE SAMMUT 0 Eamonn Keogh 0 School of Computer Science and Engineering, University of New South Wales , Sydney, Australia We present a method of constructive induction aimed at learning tasks involving multivariate time series data. Using metafeatures, the scope of attribute-value learning is expanded to domains with instances that have some kind of recurring substructure, such as strokes in handwriting recognition, or local maxima in time series data. The types of substructures are defined by the user, but are extracted automatically and are used to construct attributes. Metafeatures are applied to two real domains: sign language recognition and ECG classification. Using metafeatures we are able to generate classifiers that are either comprehensible or accurate, producing results that are comparable to hand-crafted preprocessing and comparable to human experts. 1. Introduction There are many domains that do not easily fit into the static attribute-value model so common in machine learning. These include multivariate time series, optical character recognition, sequence recognition, basket analysis and web logs. Consequently, researchers hoping to apply attribute-value learners to these domains have few choices: apply handcrafted preprocessing, write a learner specifically designed for the domain, or use a learner with a more powerful representation, such as relational learning or graph-based induction. However, each of these has problems. Hand-crafted preprocessing is frequently used, but is time-consuming and requires in-depth domain knowledge. Writing a custom learner is possible, but is labour-intensive. Relational learning techniques tend to be very sensitive to noise and to the particular clausal representation selected. They are typically unable to process large data sets in a reasonable time frame, and/or require the user to set limits on the search such as refinement rules (Cohen, 1995). In this paper, we use a generic constructive induction technique to allow for domains where instances exhibit recurring substructures. For example, with Chinese character recognition, the recurring substructure is a stroke. The user defines the recurring substructures (termed metafeatures), but subsequent steps are automated. Further, our experimental results show that a small set of generic metafeatures may be applicable to many temporal domains. These substructures are extracted, and a novel clustering algorithm is used to construct synthetic attributes based on the presence or absence of certain substructures. Standard learners can then be applied. The learnt concepts are expressed using the same substructures identified by the user. Since these substructures are frequently the same concepts humans use themselves in classifying instances, this results in readable descriptions. To our knowledge, there are very few other systems that build classifiers for multivariate time series that are comprehensible. There are several novel aspects to the approach. Recurring substructures are processed to construct a set of features using a specially designed clustering technique. Temporal events, global properties of the time series and specified attributes can all be combined within the well-understood propositional framework. The results of propositional learning are postprocessed to generate more human-readable descriptions. The net effect of employing these techniques is a system that can easily and simply be applied to new temporal classification problem domains with little or no modifications. This paper begins by motivating work in this field, and providing two examples, before giving an overview of both related fields and directly related work. A theoretical and practical definition of the problem, is then presented. An overview of the approach taken using a pedagogical domain is given, followed by a discussion of the implementation of TClass, our temporal classification learner, with several extensions to the basic idea. Experimental results using TClass on several domains are presented, followed by a conclusion and some suggestions for future work. Prevalence and importance of temporal classification domains There are many real domains that are temporal in nature. An examination of the UCI repository (Blake & Merz, 1998) reveals that there are at least six domains that were originally temporal classification tasks but were propositionalised to make it possible to use attribute-value learners.1 Examples of other problems that are temporal classification tasks include: gesture recognition, printed character recognition, speaker identification and/or authentication, classification of medical time series such as electrocardiographs and electroencephalograms, robot sensor data analysis and more. Given the importance and prevalence of these temporal classification problems, this work builds a tool that could build classifiers for these kinds of domains and apply it to them out-of-the-box in much the same way that a toolkit like Weka (Witten & Frank, 1999) is applied to propositional problems. Consider two application domains of classification of multivariate time series (we will term this temporal classification for convenience). These two examples will provide us with some insights into the nature of the problem. 2.2.1. Tech support. This pedagogical domain is meant as an extremely simple example of temporal classification. A computer company called SoftCorp provides telephone technical support. These phone calls are recorded for analysis. SoftCorp discovers that the handling of these phone calls has a huge impact on future buying patterns of its customers, so based on the recordings of tech support responses, they hope to find the critical difference between happy and angry customers. An engineer suggests that the volume level of the conversation is an indication of frustration level. SoftCorp divides each phone call into 30-second segments; and work out the average volume in each segment. If it is a high-volume conversation, it is marked as H, while if it is at a reasonable volume level, it is labelled as L. On some subset of their data they determine whether the tech support calls resulted in happy or angry customers by independent means. Note that conversations are not of fixed length; some conversations are short, others take a bit longer. Six examples of recorded phone conversations are show in Table 1. SoftCorp would like to employ machine learning to find rules to predict whether, at the end of a conversation, a customer is likely to be happy or angry. 2.2.2. Recognition of signs from a sign language. Consider the task of recognising signs from a sign language2 using instrumented gloves. The glove provides the information shown in Table 2. Each instance is labelled with its class, and all of the values are sampled approximately 23 times a second. Each training instance consists of a sequence of measurements. Training instances di (...truncated)