Decision trees for hierarchical multi-label classification
Celine Vens
0
Jan Struyf
0
Leander Schietgat
0
Sao Deroski
0
Hendrik Blockeel
0
0
S. Deroski Department of Knowledge Technologies, Joef Stefan Institute
, Jamova 39,
1000 Ljubljana, Slovenia
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS's FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction
-
time. We conclude that HMC trees should definitely be considered in HMC tasks where
interpretable models are desired.
1 Introduction
Classification refers to the task of learning from a set of classified instances a model that
can predict the class of previously unseen instances. Hierarchical multi-label classification
(HMC) differs from normal classification in two ways: (1) a single example may belong to
multiple classes simultaneously; and (2) the classes are organized in a hierarchy: an example
that belongs to some class automatically belongs to all its superclasses (we call this the
hierarchy constraint).
Examples of this kind of problems are found in several domains, including text
classification (Rousu et al. 2006), functional genomics (Barutcuoglu et al. 2006), and object
recognition (Stenger et al. 2007). In functional genomics, which is the application on which
we focus, an important problem is predicting the functions of genes. Biologists have a set
of possible functions that genes may have, and these functions are organized in a hierarchy
(see Fig. 1 for an example). It is known that a single gene may have multiple functions.
In order to understand the interactions between different genes, it is important to obtain an
interpretable model.
Several methods can be distinguished that handle HMC tasks. A first approach
transforms an HMC task into a separate binary classification task for each class in the hierarchy
and applies an existing classification algorithm. We refer to it as the SC (single-label
classification) approach. This technique has several disadvantages. First, it is inefficient, because
the learner has to be run |C| times, with |C| the number of classes, which can be hundreds
or thousands in some applications. Second, it often results in learning from strongly skewed
class distributions: in typical HMC applications classes at lower levels of the hierarchy
often have very small frequencies, while (because of the hierarchy constraint) the frequency
of classes at higher levels tends to be very high. Many learners have problems with strongly
skewed class distributions (Weiss and Provost 2003). Third, from the knowledge discovery
point of view, the learned models identify features relevant for one class, rather than
identifying features with high overall relevance. Finally, the hierarchy constraint is not taken
into account, i.e. it is not automatically imposed that an instance belonging to a class should
belong to all its superclasses.
A second approach is to adapt the SC method, so that this last issue is dealt with. Some
authors have proposed to hierarchically combine the class-wise models in the prediction
Fig. 1 A small part of the
hierarchical FunCat classification
scheme (Mewes et al. 1999)
stage, so that a classifier constructed for a class c can only predict positive if the classifier
for the parent class of c has predicted positive (Barutcuoglu et al. 2006; Cesa-Bianchi et al.
2006). In addition, one can also take the hierarchy constraint into account during training
by restricting the training set for the classifier for class c to those instances belonging to the
parent class of c (Cesa-Bianchi et al. 2006). This approach is called the HSC (hierarchical
single-label classification) approach throughout the text.
A third approach is to develop learners that learn a single multi-label model that predicts
all the classes of an example at once (Clare 2003; Blockeel et al. 2006). Next to taking
the hierarchy constraint into account, this approach is also able to identify features that are
relevant to all classes. We call this the HMC approach.
Given our target application of functional genomics, we focus on decision tree methods,
because of their interpretability. In Blockeel et al. (2006), we presented an empirical study
on the use of decision trees for HMC tasks. We presented an HMC decision tree learner, and
showed that it can outperform the SC approach on all fronts: predictive performance, model
size, and induction time.
In this article, we further investigate the suitability of decision trees for HMC tasks, by
extending the analysis along several dimensions. The most important contributions of this
work are the following:
We consider three decision tree approaches towards HMC tasks: (1) learning a separate
binary decision tree for each class label (SC), (2) learning and applying such single-label
decision trees in a hierarchical way (HSC), and (3) learning one tree that predicts all
classes at once (HMC). The HSC approach has not been considered before in the context
of decision trees.
We consider more complex class hierarchies. In particular, the hierarchies are no longer
constrained to trees, but can be directed acyclic graphs (DAGs). To our knowledge, this
setting has not been thoroughly studied before. We show how the decision tree approaches
can be modified to support class hierarchies with a DAG structure.
The approaches are compared by performing an extensive experimental evaluation on 24
data sets from yeast functional genomics, using as classification schemes MIPSs
FunCat (Mewes et al. 1999) (tree structure) and the Gene Ontology (Ashburner et al. 2000)
(DAG structure). The latter results in datasets with (on average) 4000 class labels, which
underlines the scalability of the approaches to large class hierarchies.
When (...truncated)