A real-time gesture recognition system using near-infrared imagery
RESEARCH ARTICLE
A real-time gesture recognition system using
near-infrared imagery
Tomás Mantecón ID*, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso Garcı́a
Grupo de Tratamiento de Imágenes, Information Processing and Telecommunications Center and ETSI
Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Mantecón T, del-Blanco CR, Jaureguizar
F, Garcı́a N (2019) A real-time gesture recognition
system using near-infrared imagery. PLoS ONE 14
(10): e0223320. https://doi.org/10.1371/journal.
pone.0223320
Abstract
Visual hand gesture recognition systems are promising technologies for Human Computer
Interaction, as they allow a more immersive and intuitive interaction. Most of these systems
are based on the analysis of skeleton information, which is in turn inferred from color, depth,
or near-infrared imagery. However, the robust extraction of skeleton information from
images is only possible for a subset of hand poses, which restricts the range of gestures that
can be recognized. In this paper, a real-time hand gesture recognition system based on a
near-infrared device is presented, which directly analyzes the infrared imagery to infer static
and dynamic gestures, without using skeleton information. Thus, a much wider range of
hand gestures can be recognized in comparison with skeleton-based approaches. To validate the proposed system, a new dataset of near-infrared imagery has been created, from
which good results that outperform other state-of-the-art strategies have been obtained.
Editor: Wajid Mumtaz, National University of
Sciences and Technology, PAKISTAN
Received: April 10, 2019
Accepted: September 18, 2019
Published: October 3, 2019
Introduction
Copyright: © 2019 Mantecón et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
In recent years, the number of works proposing new experiences for Human Machine Interaction (HMI) have considerably increased, especially those based on visual gesture recognition.
In [1], a rehabilitation application to improve upper limb activity and mobility is presented. It
uses a near-infrared visual device that also provides hand-skeleton information. Similarly,
hand-skeleton information has been used to control the radio inside a car in the field of driver
assistance [2]. Another example is the virtual reality sickness simulator presented in [3], which
makes use of hand-skeleton information provided by the Leap Motion device to interact with
the virtual environment. In [4], a structured light device providing depth-base information is
used to operate a robot by hand gestures for rescue operations. A solution to manage an
Unmanned Aerial Vehicle (UAV) is introduced in [5], which uses depth information to recognize hand gestures. A hand gesture recognition system based on depth imagery is used in [6]
to interact with a computer.
Several visual devices have been used for the task of hand gesture recognition, providing
color, depth, and infrared imagery. Some of them also provide higher semantic information,
such a hand or body skeleton. This is a kind of mathematical model that represents a hand /
body (in a very abstract form) by a set of vertices and edges (a graph), which encodes the
Data Availability Statement: All relevant data are
available at http://www.gti.ssr.upm.es/data/
MultiModalHandGesture_dataset.
Funding: This work has been partially supported by
the Ministerio de Economı́a, Industria y
Competitividad (AEI/FEDER) of the Spanish
Government under project TEC2016-75981
(IVME). There was no additional external funding
received for this study.
Competing interests: The authors have declared
that no competing interests exist.
PLOS ONE | https://doi.org/10.1371/journal.pone.0223320 October 3, 2019
1 / 17
A real-time gesture recognition system using near-infrared imagery
position of the bones and the joints in 3D. No appearance information is considered in this
case. In this regard, one of the most successful devices is the Leap Motion, which was the first
one to introduce a detailed skeleton of both hands, unlike other popular visual devices that
only provided high-level skeleton information of the whole body (Kinect version 1 and 2, or
Asus Xtion). More specifically, Leap Motion provides a 23-node hand-skeleton model in comparison with the 3-node hand model of Kinect version 2.
The Leap Motion device also provides near-infrared stereo images with a resolution of
640 × 240 pixels and a variable frame rate between 30 and 200 fps. However, most of the works
that use the Leap Motion only use the hand-skeleton information, which is obtained from the
near-infrared images by a proprietary software. There are two main reasons for this fact. The
first one is that processing image-based information is much complex than processing skeleton
information. Therefore, the use of skeleton information allows for faster and easier deployment of HMI applications. The other reason is the strong geometrical distortion caused by the
wide field of view of the optics of the embedded cameras. This fact radically changes the
appearance of the object inside the image. The root of the problem is twofold. First, from a
human perspective, the acquired hand images have a different geometry / appearance from the
one perceived by a human, posing problems with the existing analysis algorithms that assume
a human-perception-based representation. Second, from a machine-learning perspective, the
hand appearance strongly changes according to its location inside the image (which does not
occur with conventional cameras), leading to extra-complexity in the feature-based characterization of the different hand gestures. This can be considered as a new degree of freedom that
makes more challenging the recognition / classification due to the higher intra-class variance,
which ultimately results in a loss of performance.
Nonetheless, the use of the near-infrared images of the Leap Motion has potentially four
advantages over image devices with human-like field of views, which appropriately exploited
can tilt the scale in favour of Leap Motion. The first one is its significantly wider field of view
in comparison with other devices, allowing a high degree of movement of the hand in close
interaction situations. Despite the geometric distortions introduced, this wide field of view is
important for allowing a natural and friendly interaction, which is not able with cameras with
a standard field of view due to the reduced area of interaction that makes very difficult to perform the gestures inside the sensed area. The second advantage is its superior frame rate, up to
200 fps, with regard to other devices. Thus, the Leap Motion not only has more information
available to p (...truncated)