Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions

International Journal of Computer Vision, Nov 2017

In this paper, we propose a motion model that focuses on the discriminative parts of the human body related to target motions to classify human motions into specific categories, and apply this model to multi-class daily motion classifications. We extend this model to a motion recognition system which generates multiple sentences associated with human motions. The motion model is evaluated with the following four datasets acquired by a Kinect sensor or multiple infrared cameras in a motion capture studio: UCF-kinect; UT-kinect; HDM05-mocap; and YNL-mocap. We also evaluate the sentences generated from the dataset of motion and language pairs. The experimental results indicate that the motion model improves classification accuracy and our approach is better than other state-of-the-art methods for specific datasets, including human–object interactions with variations in the duration of motions, such as daily human motions. We achieve a classification rate of 81.1% for multi-class daily motion classifications in a non cross-subject setting. Additionally, the sentences generated by the motion recognition system are semantically and syntactically appropriate for the description of the target motion, which may lead to human–robot interaction using natural language.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11263-017-1053-3.pdf

Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions

Int J Comput Vis Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions Yusuke Goutsu 0 1 2 3 Wataru Takano 0 1 2 3 Yoshihiko Nakamura 0 1 2 3 0 Center for Mathematical Modeling and Data Science, Osaka University , 1-3 Machikaneyamacho, Toyonaka-shi, Osaka , Japan 1 Computer Vision Research Group, Advanced Industrial Science and Technology (AIST) , Central 1, 1-1-1 Umezono, Tsukuba, Ibaraki , Japan 2 Communicated by Koichi Kise 3 Mechano-Informatics, University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo , Japan In this paper, we propose a motion model that focuses on the discriminative parts of the human body related to target motions to classify human motions into specific categories, and apply this model to multi-class daily motion classifications. We extend this model to a motion recognition system which generates multiple sentences associated with human motions. The motion model is evaluated with the following four datasets acquired by a Kinect sensor or multiple infrared cameras in a motion capture studio: UCFkinect; UT-kinect; HDM05-mocap; and YNL-mocap. We also evaluate the sentences generated from the dataset of motion and language pairs. The experimental results indicate that the motion model improves classification accuracy and our approach is better than other state-of-the-art methods for specific datasets, including human-object interactions with variations in the duration of motions, such as daily human motions. We achieve a classification rate of 81.1% for multi-class daily motion classifications in a non cross-subject setting. Additionally, the sentences generated by the motion recognition system are semantically and syntactically appropriate for the description of the target motion, which may lead to human-robot interaction using natural language. Hidden Markov model; Fisher vector; Multiple kernel learning; Motion classification; Multi-class; Sentence description 1 Introduction As the result of a change of social demand from industrial uses to service uses, robots and systems have become more intelligent and are a familiar presence in our daily lives. Along with this change, intelligent robots and systems used in human living areas should be expected to have the abilities to observe humans closely, understand human behavior, grasp their intentions and give proper livelihood support. Classifying daily human motions into specific categories plays an important role because a failure to do so could cause danger or inconvenience to humans. An intuitive and common method to represent human motions is to use sequences of skeleton configurations. Optical motion capture systems provide accurate 3D skeleton markers of motion by using multiple infrared cameras. These systems are limited to use in motion capture studios and subjects have to wear cumbersome devices while performing motions. However, the release of low-cost and marker-less motion sensors, such as the Kinect developed by Microsoft, has recently made skeleton-position extractions much easier and more practical for skeleton-based motion classification (Shotton et al. 2013) . Presti and Cascia (2016) have reviewed the many works related to skeleton-based motion classification. In this context, we proceed on the basis of the following two findings. First, local motion features derived from discriminative parts of human body are more useful than a global motion feature derived from the whole body. This is because the discriminative body parts are different according to the target motion. For example, the “punch” motion mainly uses one arm, the “clap” motion mainly uses both arms and the “run” motion mainly uses both legs. Second, it is also desirable to classify daily human motions systematically to focus on the discriminative body parts related to the target motion. This is because human motion is an interaction between objects in the environment and the body parts in contact with them. For example, the relationship between the positions of a hand and the face becomes important in the “make a phone call” or “drink water” motions because of the contact between an object and an ear or the mouth, respectively. However, simply classifying human motions cannot directly lead to behavior supports. A connection to other information is also required for the highly intelligent processing referred to as “motion recognition”. Here, humans are different from other animals in that they can understand the real world using natural language and engage in complex communication with others. In order to understand the real world in the same way, it is important for intelligent robots and systems to link the real world with natural language. Therefore, we also use the properties of natural language, which has the benefits of scalability due to the usage of largescale language corpora and interpretability by humans. By connecting human motions to common words, motion classification expands to include a (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11263-017-1053-3.pdf

Yusuke Goutsu, Wataru Takano, Yoshihiko Nakamura. Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions, International Journal of Computer Vision, 2017, pp. 495-514, Volume 126, Issue 5, DOI: 10.1007/s11263-017-1053-3