Learning from Innate Behaviors: A Quantitative Evaluation of Neural Network Controllers (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1023%2FA%3A1007444708590.pdf

Learning from Innate Behaviors: A Quantitative Evaluation of Neural Network Controllers

NOEL E. SHARKEY 0 Editors: Henry Hexmoor and Maja Mataric 0 Department of Computer Science, University of Sheffield , U.K The aim was to investigate a method of developing mobile robot controllers based on ideas about how plastic neural systems adapt to their environment by extracting regularities from the amalgamated behavior of inflexible (non-plastic) innate subsystems interacting with the world. Incremental bootstrapping of neural network controllers was examined. The objective was twofold. First, to develop and evaluate the use of prewired or innate robot controllers to bootstrap backpropagation learning for Multi-Layer Perceptron (MLP) controllers. Second, to develop and evaluate a new MLP controller trained on the back of another bootstrapped controller. The experimental hypothesis was that MLPs would improve on the performance of controllers used to train them. The performances of the innate and bootstrapped MLP controllers were compared in eight experiments on the tasks of avoiding obstacles and finding goals. Four quantitative measures were employed: the number of sensorimotor loops required to complete a task; the distance traveled; the mean distance from walls and obstacles; the smoothness of travel. The overall pattern of results from statistical analyses of these quantities supported the hypothesis; the MLP controllers completed the tasks faster, smoother, and steered further from obstacles and walls than their innate teachers. In particular, a single MLP controller incrementally bootstrapped by a MLP subsumption controller was superior to the others. 1. Introduction Neural computing techniques are becoming increasingly popular for training controllers for mobile robots (Bekey & Goldberg, 1993, Dorigo, 1996, Sharkey, 1997). Some of the most used methods have been concerned with techniques such as reinforcement learning (Krose, 1995) or evolutionary learning (Nolfi, 1997) that require little a priori knowledge of the task domain. Supervised learning, on the other hand, has been neglected because of a belief that the designer must provide precise teaching vectors to train controllers, i.e., the experimenter must understand the domain in enough detail to calculate exactly the correct control signals for every move (Sharkey, 1997 (b)). However, an alternative approach to supervised learning is presented here in which controllers are trained by the system in which This research began with an inspiration from biological systems that successful adaption may involve the use of innate hardwired behaviors to bootstrap learning in plastic neural nets. For example, simple observation shows that a number of quadrupeds, such as giraffes and zebras, are born on the run with jerky behavior that quickly becomes smoother as the animal adapts to its environment. In detailed experimental work, Johnson (Johnson, 1992, Johnson & Bolhuis, 1991) found that newly hatched chicks show a limited range of automatic behaviors or predispositions that are triggered by particular environmental stimuli such as pecking at static objects with certain dimensions and contrast and running toward warmth. Considering a number of findings on the neural system of the chick, Johnson (1992) proposed two comparatively independent processes; one concerned with predispositions and the other with a learning system sub-served by the neural structure IMHV. The general idea is that the first process ensures that the second learns. The studies reported here assess the development of the use of prewired or innate controllers to bootstrap learning in Multi-Layer Perceptrons (MLP) to navigate a mobile robot to goals in an obstacle laden environment (c.f. Nolfi & Parisi, 1997). This task was chosen because of its common use as a behavioral benchmark for controllers (Anderson & Donath, 1990, Donnart & Meyer, 1996, Meeden, 1996, Floreano & Mondada, 1996, Touzet, 1997, Salomon, 1997, del Millan, 1996) and its employment in autonomous robotics for several decades (Walter, 1950). Given the previous work it seemed reasonable to assume that an already existing controller could be used to train a neural network controller. However, there are two novel questions here. First, could an improvement be gained over the performance of simple innate hardwired reactive controllers by using them to bootstrap learning in neural network controllers? Second, could further improvements be gained by training on the back of previously trained neural network controllers? Such improvements may be possible if the robots behavior exhibits an underlying systematic aspect when it interacts with the world. It is this systematic aspect that would be extracted by the networks during training. A MLP, trained with backpropagation, can be used to construct a predictive model of a data generator. If this is a noisy generator then a MLP with an appropriate number of free parameters may capture the regularities in the data and make novel predictions that are within a small margin of error. In these terms, the innate controller may be thought of as a noisy data generator in which behavior considered to be noise in one type of environment may be considered to be appropriate in another. Thus, the systematic aspect of the behaviors of an innate controller will depend on the particular environmental circumstances, e.g., swamp, forest, desert, office building, etc. Seen this way, the problem of finding the systematic aspect of the behavior is analogous to using polynomials to fit curves to noisy data. If the order of the polynomial is too high, e.g., there are as many free parameters as there are data points, then the data will be over-fitted and the approximation to the underlying function (the generalization performance) will be poor. If, on the other hand, the order of the polynomial is at an appropriate level, the curve will pass smoothly through the noisy data and generalize well on novel inputs. The general behavior of an innate controller may be represented by the function ei(x), where x can be external and/or internal stimuli. In a particular type of environment only a subset of the behavior may be employed and this can be represented by another (sub) function ew(x). Unless ew(x) is optimal, adaption consists of finding a function a(x) that is appropriate for the current environmental circumstance. In order to do this, the learning method for a plastic neural net must find regularities in the behavior generated by ew(x). That is, the neural net must treat ew(x) as as if it were a(x) embedded in behavioral noise. In other words, adaption depends on the quality of the networks approximation of a(x). Two innate modules were developed for the current research; one for avoiding obstacles and walls and the other for finding goal locations. These were used individually as controllers, as illustrated in Figure 1(a) or combined in a simplified subsumption architecture control system (Brooks, 1986) as shown in Figure 1(b). The innate obstacle avo (...truncated)