Embodied Gesture Processing: Motor-Based Integration of Perception and Action in Social Artificial Agents
Amir Sadeghipour
0
Stefan Kopp
0
0
A. Sadeghipour (&) S. Kopp Sociable Agents Group,
Cognitive Interaction Technology (CITEC), Bielefeld University
, P.O. Box 100 131, 33501 Bielefeld,
Germany
A close coupling of perception and action processes is assumed to play an important role in basic capabilities of social interaction, such as guiding attention and observation of others' behavior, coordinating the form and functions of behavior, or grounding the understanding of others' behavior in one's own experiences. In the attempt to endow artificial embodied agents with similar abilities, we present a probabilistic model for the integration of perception and generation of hand-arm gestures via a hierarchy of shared motor representations, allowing for combined bottom-up and top-down processing. Results from human-agent interactions are reported demonstrating the model's performance in learning, observation, imitation, and generation of gestures.
-
In social interactions, one is continuously confronted with
an intricate complexity of verbal and nonverbal behavior,
including hand-arm gestures, body movements or facial
expressions. All of these behaviors can be indicative of the
others referential, communicative, or social intentions [1].
In this paper, we focus on hand-arm gestures. Interlocutors
in social interaction incessantly and concurrently produce
and perceive a variety of gestures. The generation of a
handarm gesture, coarsely, consists of two steps. First, finding
the proper gesture for an intention that is to be realized
under current context constraints. Second, performing the
gesture using ones motor repertoire. Similarly, the
recipient perceives and analyzes the others movement both at
motor and at intention levels. Cumulating evidence suggests
that these two processes are not separate, but that
recognizing and understanding a gesture is grounded in the
perceivers own motor repertoire [2, 3]. In other words, a hand
movement is understood, at least partially, by evoking the
motor system of the observer. This is evidenced by
socalled motor resonances showing that the motor and action
(premotor) systems become activated during both
performance and observation of bodily behavior [46]. One
hypothesis is that these neural resonances reflect the
involvement of the motor system in deriving predictions
and evaluating hypotheses about the incoming observations.
This integration of perception and action enables imitating
or mimicking the observed behavior, either overtly or
covertly, and thus forms an embodied basis for
understanding other embodied agents [7], and for communication
and intersubjectivity of intentional agents more generally
(cf. simulation theory [8]). Hence, perception-action links
(and resulting resonances) are assumed to be effective at
various levels of a hierarchical perceptual-motor system,
from kinematic features to motor commands to goals and
intentions [9], whereas these levels interact bi-directionally;
bottom-up and top-down [10]. Further, a close
perceptionaction integration can be assumed to support two important
ingredients of social interaction: First, fast and often
subconscious inter-personal coordinations (e.g., alignment,
mimicry, interactional synchrony) that lead to rapport [11]
Fig. 1 Overall model for
cognitive processes of
embodied perception and
generation, integrated in a
shared motor knowledge
and social resonance [12] between interactants. Second,
social learning of behavior by means of imitation, which
helps to acquire and interactively establish behavior
through connected perceiving, processing, and reproducing
of their pertinent features. All of these aforementioned
effects may also applyat least to a certain extentto the
interaction between humans and embodied agents, be it
physical robots or virtual characters (see [12] for a detailed
discussion). For example, brain imaging studies [13, 14]
showed that artificial agents with sufficiently natural
appearance and movements can evoke motor resonances in
human observers.
Against this background, we aim for interactive
embodied systems ultimately able to engage in social
interactions, in a human-like manner, based on cognitively
plausible mechanisms. A central ingredient is a
computational model for integrated perception and generation of
hand-arm gestures. This model has to fulfill a number of
requirements: (1) perceiving and generating behavior in a
fast, robust, and incremental manner, (2) concurrent and
mutually interacting perception and generation, (3)
concurrent processing at different levels of motor abstraction,
from movement trajectories to intentions; (4) incremental
construction of hierarchical knowledge structures through
learning from observation and imitation.
In this paper, we present a cognitive computational
model that has been devised and developed to meet the
above-mentioned requirements for the domain of hand-arm
gestures. Focusing on the motor aspect of gestures, it
should also serve as a basis for future modeling of higher
cognitive levels of social intentions. In the section Shared
Motor Knowledge Model, we introduce the Shared Motor
Knowledge Model that serves as a basis for integrating
perception and action, both of which operate upon these
knowledge structures by means of forward/inverse models.
In A Probabilistic Model of Motor Resonances we
present a probabilistic approach to simulate fast,
incremental and concurrent resonances and their exploitation of
these structures in both perceiving and generating behavior.
Section Perception-Action Integration details how the
integration of perception and action is achieved in this
model and how this helps to model and cope with
characteristics of nonverbal human social interaction. Results
of applying this model to real-world data (marker-free
gesture tracking) from a human-agent interaction scenario
are reported in Results. In the final section we discuss
our work in comparison to other related work.
Shared Motor Knowledge Model
In previous work [15], we have presented a cognitive
model for hierarchical representations of motor knowledge
for hand-arm gestures, and we proposed how these
structures can be utilized for probabilistic embodied behavior
perception. Here, we present an extended version of this
model that serves as a unified basis for both perception and
generation of hand-arm movements (wrist position
trajectories, to be specific) as they occur in natural gesturing by
human users in interaction with a humanoid virtual agent.
Overall, the model consists of three main modules (see
Fig. 1): shared motor knowledge, perception and
generation. This model allows for parallel gesture generation and
perception processes grounded in shared motor knowledge.
Further, the hierarchical model enables bottom-up
processing (mainly for perceptual tasks) interacting
bidirectionally with top-down processing (for action production as
well as attention and perception guidance). In the
remainder of this section, we d (...truncated)