Learning from Innate Behaviors: A Quantitative Evaluation of Neural Network Controllers
NOEL E. SHARKEY
0
Editors: Henry Hexmoor and Maja Mataric
0
Department of Computer Science, University of Sheffield
,
U.K
The aim was to investigate a method of developing mobile robot controllers based on ideas about how plastic neural systems adapt to their environment by extracting regularities from the amalgamated behavior of inflexible (non-plastic) innate subsystems interacting with the world. Incremental bootstrapping of neural network controllers was examined. The objective was twofold. First, to develop and evaluate the use of prewired or innate robot controllers to bootstrap backpropagation learning for Multi-Layer Perceptron (MLP) controllers. Second, to develop and evaluate a new MLP controller trained on the back of another bootstrapped controller. The experimental hypothesis was that MLPs would improve on the performance of controllers used to train them. The performances of the innate and bootstrapped MLP controllers were compared in eight experiments on the tasks of avoiding obstacles and finding goals. Four quantitative measures were employed: the number of sensorimotor loops required to complete a task; the distance traveled; the mean distance from walls and obstacles; the smoothness of travel. The overall pattern of results from statistical analyses of these quantities supported the hypothesis; the MLP controllers completed the tasks faster, smoother, and steered further from obstacles and walls than their innate teachers. In particular, a single MLP controller incrementally bootstrapped by a MLP subsumption controller was superior to the others.
1. Introduction
Neural computing techniques are becoming increasingly popular for training controllers
for mobile robots (Bekey & Goldberg, 1993, Dorigo, 1996, Sharkey, 1997). Some of the
most used methods have been concerned with techniques such as reinforcement learning
(Krose, 1995) or evolutionary learning (Nolfi, 1997) that require little a priori knowledge
of the task domain. Supervised learning, on the other hand, has been neglected because of
a belief that the designer must provide precise teaching vectors to train controllers, i.e., the
experimenter must understand the domain in enough detail to calculate exactly the correct
control signals for every move (Sharkey, 1997 (b)). However, an alternative approach to
supervised learning is presented here in which controllers are trained by the system in which
This research began with an inspiration from biological systems that successful
adaption may involve the use of innate hardwired behaviors to bootstrap learning in plastic
neural nets. For example, simple observation shows that a number of quadrupeds, such
as giraffes and zebras, are born on the run with jerky behavior that quickly becomes
smoother as the animal adapts to its environment. In detailed experimental work, Johnson
(Johnson, 1992, Johnson & Bolhuis, 1991) found that newly hatched chicks show a limited
range of automatic behaviors or predispositions that are triggered by particular
environmental stimuli such as pecking at static objects with certain dimensions and contrast and
running toward warmth. Considering a number of findings on the neural system of the
chick, Johnson (1992) proposed two comparatively independent processes; one concerned
with predispositions and the other with a learning system sub-served by the neural structure
IMHV. The general idea is that the first process ensures that the second learns.
The studies reported here assess the development of the use of prewired or innate
controllers to bootstrap learning in Multi-Layer Perceptrons (MLP) to navigate a mobile robot
to goals in an obstacle laden environment (c.f. Nolfi & Parisi, 1997). This task was chosen
because of its common use as a behavioral benchmark for controllers (Anderson & Donath,
1990, Donnart & Meyer, 1996, Meeden, 1996, Floreano & Mondada, 1996, Touzet, 1997,
Salomon, 1997, del Millan, 1996) and its employment in autonomous robotics for several
decades (Walter, 1950).
Given the previous work it seemed reasonable to assume that an already existing
controller could be used to train a neural network controller. However, there are two novel
questions here. First, could an improvement be gained over the performance of simple
innate hardwired reactive controllers by using them to bootstrap learning in neural network
controllers? Second, could further improvements be gained by training on the back of
previously trained neural network controllers? Such improvements may be possible if the
robots behavior exhibits an underlying systematic aspect when it interacts with the world.
It is this systematic aspect that would be extracted by the networks during training.
A MLP, trained with backpropagation, can be used to construct a predictive model of a
data generator. If this is a noisy generator then a MLP with an appropriate number of free
parameters may capture the regularities in the data and make novel predictions that are within
a small margin of error. In these terms, the innate controller may be thought of as a noisy
data generator in which behavior considered to be noise in one type of environment may be
considered to be appropriate in another. Thus, the systematic aspect of the behaviors of an
innate controller will depend on the particular environmental circumstances, e.g., swamp,
forest, desert, office building, etc. Seen this way, the problem of finding the systematic
aspect of the behavior is analogous to using polynomials to fit curves to noisy data. If
the order of the polynomial is too high, e.g., there are as many free parameters as there
are data points, then the data will be over-fitted and the approximation to the underlying
function (the generalization performance) will be poor. If, on the other hand, the order of
the polynomial is at an appropriate level, the curve will pass smoothly through the noisy
data and generalize well on novel inputs.
The general behavior of an innate controller may be represented by the function ei(x),
where x can be external and/or internal stimuli. In a particular type of environment only
a subset of the behavior may be employed and this can be represented by another (sub)
function ew(x). Unless ew(x) is optimal, adaption consists of finding a function a(x) that
is appropriate for the current environmental circumstance. In order to do this, the learning
method for a plastic neural net must find regularities in the behavior generated by ew(x).
That is, the neural net must treat ew(x) as as if it were a(x) embedded in behavioral noise.
In other words, adaption depends on the quality of the networks approximation of a(x).
Two innate modules were developed for the current research; one for avoiding obstacles
and walls and the other for finding goal locations. These were used individually as
controllers, as illustrated in Figure 1(a) or combined in a simplified subsumption architecture
control system (Brooks, 1986) as shown in Figure 1(b). The innate obstacle avo (...truncated)