A real-time gesture recognition system using near-infrared imagery (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0223320&type=printable

A real-time gesture recognition system using near-infrared imagery

RESEARCH ARTICLE A real-time gesture recognition system using near-infrared imagery Tomás Mantecón ID*, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso Garcı́a Grupo de Tratamiento de Imágenes, Information Processing and Telecommunications Center and ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Mantecón T, del-Blanco CR, Jaureguizar F, Garcı́a N (2019) A real-time gesture recognition system using near-infrared imagery. PLoS ONE 14 (10): e0223320. https://doi.org/10.1371/journal. pone.0223320 Abstract Visual hand gesture recognition systems are promising technologies for Human Computer Interaction, as they allow a more immersive and intuitive interaction. Most of these systems are based on the analysis of skeleton information, which is in turn inferred from color, depth, or near-infrared imagery. However, the robust extraction of skeleton information from images is only possible for a subset of hand poses, which restricts the range of gestures that can be recognized. In this paper, a real-time hand gesture recognition system based on a near-infrared device is presented, which directly analyzes the infrared imagery to infer static and dynamic gestures, without using skeleton information. Thus, a much wider range of hand gestures can be recognized in comparison with skeleton-based approaches. To validate the proposed system, a new dataset of near-infrared imagery has been created, from which good results that outperform other state-of-the-art strategies have been obtained. Editor: Wajid Mumtaz, National University of Sciences and Technology, PAKISTAN Received: April 10, 2019 Accepted: September 18, 2019 Published: October 3, 2019 Introduction Copyright: © 2019 Mantecón et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In recent years, the number of works proposing new experiences for Human Machine Interaction (HMI) have considerably increased, especially those based on visual gesture recognition. In [1], a rehabilitation application to improve upper limb activity and mobility is presented. It uses a near-infrared visual device that also provides hand-skeleton information. Similarly, hand-skeleton information has been used to control the radio inside a car in the field of driver assistance [2]. Another example is the virtual reality sickness simulator presented in [3], which makes use of hand-skeleton information provided by the Leap Motion device to interact with the virtual environment. In [4], a structured light device providing depth-base information is used to operate a robot by hand gestures for rescue operations. A solution to manage an Unmanned Aerial Vehicle (UAV) is introduced in [5], which uses depth information to recognize hand gestures. A hand gesture recognition system based on depth imagery is used in [6] to interact with a computer. Several visual devices have been used for the task of hand gesture recognition, providing color, depth, and infrared imagery. Some of them also provide higher semantic information, such a hand or body skeleton. This is a kind of mathematical model that represents a hand / body (in a very abstract form) by a set of vertices and edges (a graph), which encodes the Data Availability Statement: All relevant data are available at http://www.gti.ssr.upm.es/data/ MultiModalHandGesture_dataset. Funding: This work has been partially supported by the Ministerio de Economı́a, Industria y Competitividad (AEI/FEDER) of the Spanish Government under project TEC2016-75981 (IVME). There was no additional external funding received for this study. Competing interests: The authors have declared that no competing interests exist. PLOS ONE | https://doi.org/10.1371/journal.pone.0223320 October 3, 2019 1 / 17 A real-time gesture recognition system using near-infrared imagery position of the bones and the joints in 3D. No appearance information is considered in this case. In this regard, one of the most successful devices is the Leap Motion, which was the first one to introduce a detailed skeleton of both hands, unlike other popular visual devices that only provided high-level skeleton information of the whole body (Kinect version 1 and 2, or Asus Xtion). More specifically, Leap Motion provides a 23-node hand-skeleton model in comparison with the 3-node hand model of Kinect version 2. The Leap Motion device also provides near-infrared stereo images with a resolution of 640 × 240 pixels and a variable frame rate between 30 and 200 fps. However, most of the works that use the Leap Motion only use the hand-skeleton information, which is obtained from the near-infrared images by a proprietary software. There are two main reasons for this fact. The first one is that processing image-based information is much complex than processing skeleton information. Therefore, the use of skeleton information allows for faster and easier deployment of HMI applications. The other reason is the strong geometrical distortion caused by the wide field of view of the optics of the embedded cameras. This fact radically changes the appearance of the object inside the image. The root of the problem is twofold. First, from a human perspective, the acquired hand images have a different geometry / appearance from the one perceived by a human, posing problems with the existing analysis algorithms that assume a human-perception-based representation. Second, from a machine-learning perspective, the hand appearance strongly changes according to its location inside the image (which does not occur with conventional cameras), leading to extra-complexity in the feature-based characterization of the different hand gestures. This can be considered as a new degree of freedom that makes more challenging the recognition / classification due to the higher intra-class variance, which ultimately results in a loss of performance. Nonetheless, the use of the near-infrared images of the Leap Motion has potentially four advantages over image devices with human-like field of views, which appropriately exploited can tilt the scale in favour of Leap Motion. The first one is its significantly wider field of view in comparison with other devices, allowing a high degree of movement of the hand in close interaction situations. Despite the geometric distortions introduced, this wide field of view is important for allowing a natural and friendly interaction, which is not able with cameras with a standard field of view due to the reduced area of interaction that makes very difficult to perform the gestures inside the sensed area. The second advantage is its superior frame rate, up to 200 fps, with regard to other devices. Thus, the Leap Motion not only has more information available to p (...truncated)