Internal and collective interpretation for improving human interpretability of multi-layered neural networks
International Journal of Advances in Intelligent Informatics
Vol. 5, No. 3, November 2019, pp. 179-192
ISSN 2442-6571
179
Internal and collective interpretation for improving
human interpretability of multi-layered neural networks
Ryotaro Kamimura a,b,1,*
Kumamoto Drone Technology and Development Foundation, Techno Research Park, Techno Lab 203, 1155-12, Japan
IT Education Center, Tokai Univerisity, 4-1-1 Kitakaname, Hiratsuka, Kanagawa 259-1292, Japan
1
* corresponding author
a
b
ARTICLE INFO
Article history
Received July 1, 2019
Revised October 21, 2019
Accepted October 29, 2019
Available online October 29, 2019
Keywords
Mutual information
Internal interpretation
Collective interpretation
Inference mechanism
Generalization
ABSTRACT
The present paper aims to propose a new type of information-theoretic
method to interpret the inference mechanism of neural networks. We
interpret the internal inference mechanism for itself without any external
methods such as symbolic or fuzzy rules. In addition, we make
interpretation processes as stable as possible. This means that we interpret
the inference mechanism, considering all internal representations, created
by those different conditions and patterns. To make the internal
interpretation possible, we try to compress multi-layered neural networks
into the simplest ones without hidden layers. Then, the natural
information loss in the process of compression is complemented by the
introduction of a mutual information augmentation component. The
method was applied to two data sets, namely, the glass data set and the
pregnancy data set. In both data sets, information augmentation and
compression methods could improve generalization performance. In
addition, compressed or collective weights from the multi-layered networks
tended to produce weights, ironically, similar to the linear correlation
coefficients between inputs and targets, while the conventional methods
such as the logistic regression analysis failed to do so.
This is an open access article under the CC–BY-SA license.
1. Introduction
Machine learning has been used in many areas of our daily life, causing some troubles in our life. As
the techniques inside become larger and more complex, it becomes harder to interpret the main inference
mechanism and to explain why and how the decisions made by the machine learning techniques reach
their final conclusion. Because the methods have had serious influences over our safety [1], and the users
of the techniques should have the right to receive an explanation of how the decisions are made, there
has been an urgent need to develop methods to interpret and explain the main inference mechanism of
the machine learning techniques [2].
Thus, many types of methods for interpretation have been developed in machine learning, which can
be classified into two types: internal and external interpretation. In the internal interpretation, the
methods aim to produce models whose components can be directly inspected and interpreted [3]–[5].
On the contrary, in the external interpretation, the models are considered as black-box ones, and try to
interpret the inference mechanism externally [6]–[8]. In the neural networks, similarly as for the
machine learning techniques, the interpretation methods have been classified as “decompositional” or
“pedagogic” [9]. The pedagogic model is the black-box model and tries to infer the relations between
inputs and outputs only by inspecting the inputs and outputs externally. The decompositional approach
http://dx.doi.org/10.26555/ijain.v5i3.420
http://ijain.org
180
International Journal of Advances in Intelligent Informatics
Vol. 5, No. 3, November 2019, pp. 179-192
ISSN 2442-6571
tries to analyze the components such as connection weights and neuron activations directly. Thus, the
method can be considered as the above-mentioned internal interpretation. However, usually, in the
decompositional approach, many external methods, such as symbolic rules, fuzzy rules, decision trees,
have been used to analyze and represent the components [10]–[12]. In addition, to extract the rules,
many techniques, such as digitization of inputs and outputs for extracting rules, have been applied [9].
Thus, those methods cannot be called “internal interpretation” methods, but they have tried to interpret
the final results by some external methods, and it is more appropriate call them “external interpretation.”
As is known, the objective of the interpretation is two-fold. First, and naturally, the interpretation
method can be used to explain the inference mechanism in human intelligible ways. In addition, the
clarification of the inference mechanism can be used to improve the general property, such as
generalization performance, of neural networks. Considering two important aspects behind the
interpretation, the interpretation methods so far developed have been dependent on methods not related
to the real inference mechanism of neural networks. Thus, when we need to improve further the
performance of neural networks, it is necessary to interpret internally the main inference mechanism.
In addition to the external interpretation, we have faced another problem, that of instable
interpretation. Ordinarily, machine learning, as well as neural networks, are trained with many different
data sets and initial conditions, in particular, in evaluating generalization performance. Thus, even for
the same data set, we can have completely different internal representations due to different initial
conditions. The problem is selecting which representation among many we should interpret. One of the
practical solutions is to interpret a representation with the best generalization performance. This means
that we try to see the ability of neural networks only from one aspect of improved generalization. We
think that all representations created by different data sets and initial conditions should be taken into
account for uncovering the fundamental properties of data sets. Then, for the problem of instability of
interpretation, we should collectively interpret all internal representations created by learning, where
each representation should have equal importance. It seems to us that the problem of collective
interpretation has not been fully examined in machine learning as well as neural networks except in some
exceptional cases with the ensemble methods [9], [13]. In this context, the present paper proposes a
new type of interpretation called “collective interpretation,” in which all representations from neural
networks should be taken into account with equal importance.
We have shown that interpretable neural networks should be internally interpreted and all different
types of internal representations should be collectively interpreted. Let us consider how to create neural
networks with those properties for interpretation. As mentioned, in neural networks, there have been
many types of interpretation methods, and the majority of those (...truncated)