Human Depth Sensors-Based Activity Recognition Using Spatiotemporal Features and Hidden Markov Model for Smart Environments
Hindawi Publishing Corporation
Journal of Computer Networks and Communications
Volume 2016, Article ID 8087545, 11 pages
http://dx.doi.org/10.1155/2016/8087545
Research Article
Human Depth Sensors-Based Activity
Recognition Using Spatiotemporal Features and
Hidden Markov Model for Smart Environments
Ahmad Jalal,1 Shaharyar Kamal,2 and Daijin Kim1
1
Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
KyungHee University, Suwon, Republic of Korea
2
Correspondence should be addressed to Ahmad Jalal;
Received 30 June 2016; Accepted 15 September 2016
Academic Editor: Liangtian Wan
Copyright © 2016 Ahmad Jalal et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nowadays, advancements in depth imaging technologies have made human activity recognition (HAR) reliable without attaching
optical markers or any other motion sensors to human body parts. This study presents a depth imaging-based HAR system to
monitor and recognize human activities. In this work, we proposed spatiotemporal features approach to detect, track, and recognize
human silhouettes using a sequence of RGB-D images. Under our proposed HAR framework, the required procedure includes
detection of human depth silhouettes from the raw depth image sequence, removing background noise, and tracking of human
silhouettes using frame differentiation constraints of human motion information. These depth silhouettes extract the spatiotemporal
features based on depth sequential history, motion identification, optical flow, and joints information. Then, these features are
processed by principal component analysis for dimension reduction and better feature representation. Finally, these optimal features
are trained and they recognized activity using hidden Markov model. During experimental results, we demonstrate our proposed
approach on three challenging depth videos datasets including IM-DailyDepthActivity, MSRAction3D, and MSRDailyActivity3D.
All experimental results show the superiority of the proposed approach over the state-of-the-art methods.
1. Introduction
Human tracking and activity recognition are defined as
recognizing different activities by considering activity feature
extraction and pattern recognition techniques based on specific input data from innovative sensors (i.e., motion sensors
and video cameras) [1–5]. In recent years, advancement of
these sensors has boosted the production of novel techniques
for pervasive human tracking, observing human motion,
detecting uncertain events [6–8], silhouette tracking, and
emotion recognition in the real-world environments [9–11].
In these domains, the term which is most commonly used to
cover all these topics is technically termed as human tracking
and activity recognition [12–14]. In the motion sensorsbased activity recognition, activity recognition is based on
classifying sensory data using one or more sensor devices.
In [15], Casale et al. proposed a complete review about
the state-of-the-art activity classification methods using data
from one or more accelerometers. In this work, classification
approaches are based on RFs features which classify five
daily routine activities from bluetooth accelerometer placed
at breast of the human body, using a 319-dimensional feature
vector. In [16], fast FFT and decision tree classifier algorithm are proposed to detect physical activity using biaxial
accelerometers attached on different parts of the human
body. However, these motion sensors-based approaches are
not feasible methods for recognition due to uncomfort of
the users to wear electronic sensors in their daily life. Also,
combining multiple sensors for improvement in recognition
performance causes high computation load. Thus, videobased human tracking and activity recognition is proposed
where the depth features are extracted from a RGB-D video
camera.
Depth silhouettes have made proactive contributions and
are the most famous representation for human tracking and
activity recognition from which useful human shape features
2
Journal of Computer Networks and Communications
Depth image
sequence
Preprocessing
Background
denoising techniques
Segmentation and
tracking
Feature extraction
Depth shape
features
Joints points
features
Feature vectors
Training and
recognized
activity
Recognizer
engine
Maximum
likelihood
Clustering
techniques based on
K-mean method
Figure 1: System architecture of the proposed human activity recognition system.
are extracted. These depth silhouettes explore research issues
and are used as practical applications including life-care
systems, surveillance system, security system, face verification, patient monitoring systems, and human gait recognition systems. In [17], several algorithms are developed for
feature extraction from the silhouette data of the tracked
human subject using depth images as the pixel source. These
parameters include ratio of height to weight of the tracked
human subject. Also, motion characteristics and distance
parameters are used as features for the activity recognition.
In [14], a novel life logging translation and scaling invariant
features approach is designed where 2D maps are computed
through Radon transform which are further processed as
1D feature profiles through R transform. These features are
further reduced by PCA and symbolized by Linde, Buzo,
and Gray (LBG) clustering technique to train and recognize
different activities. In [18], a discriminative representation
method is proposed as structure-motion kinematics features
including the structure similarity and head-floor distance
based on skeleton joint points information. However, these
effective trajectory projection based kinematic schemes are
learnt by a SVM classifier to recognize activities using
the depth maps. In [19], an activity recognition system is
designed to provide continuous monitoring and recording
of daily life activities. The system includes depth silhouettes
as an input to produce skeleton model and its body points
information. This information is used as features and is
computed using a set of magnitude and direction angle
features which are further used for training and testing
via hidden Markov models (HMMs). These state-of-the-art
methods [14, 17–19] proved more efficiency for recognition
accuracy using depth silhouette. However, it is still difficult
to find best features from limited information such as joint
points information especially during occlusions. It shows
bad impact over recognition accuracy. Therefore, we needed
to develop methodology which provides combined effects
of full-body silhouettes and joints information to improve
activity recognition performance.
In this paper, we proposed a novel method to recognize
activities using sequence of depth images. During preprocessing steps, we extracted human depth silhouet (...truncated)