Sparsity Based Locality-Sensitive Discriminative Dictionary Learning for Video Semantic Analysis

Mathematical Problems in Engineering, Aug 2018

Dictionary learning (DL) and sparse representation (SR) based classifiers have greatly impacted the classification performance and have had good recognition rate on image data. In video semantic analysis (VSA), the local structure of video data contains more vital discriminative information needed for classification. However, this has not been fully exploited by the current DL based approaches. Besides, similar coding findings are not being realized from video features with the same video category. Based on the issues stated afore, a novel learning algorithm, called sparsity based locality-sensitive discriminative dictionary learning (SLSDDL) for VSA is proposed in this paper. In the proposed algorithm, a discriminant loss function for the category based on sparse coding of the sparse coefficients is introduced into structure of locality-sensitive dictionary learning (LSDL) algorithm. Finally, the sparse coefficients for the testing video feature sample are solved by the optimized method of SLSDDL and the classification result for video semantic is obtained by minimizing the error between the original and reconstructed samples. The experiment results show that the proposed SLSDDL significantly improves the performance of video semantic detection compared with the comparative state-of-the-art approaches. Moreover, the robustness to various diverse environments in video is also demonstrated, which proves the universality of the novel approach.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://downloads.hindawi.com/journals/mpe/2018/9312563.pdf

Sparsity Based Locality-Sensitive Discriminative Dictionary Learning for Video Semantic Analysis

Sparsity Based Locality-Sensitive Discriminative Dictionary Learning for Video Semantic Analysis Ben-Bright Benuwa,1,2 Yongzhao Zhan,1 Benjamin Ghansah,2 Ernest K. Ansah,2 and Andriana Sarkodie2 1School of Computer Science and Telecommunication Engineering, Jiangsu University, Xuefu Road 301, Jingkou District, Zhenjiang Province, Jiangsu City, Zhenjiang 212013, China 2School of Computer Science, Data Link Institute, P. O Box 2481, Tema, Ghana Correspondence should be addressed to Ben-Bright Benuwa; moc.liamg@877awuneb Received 20 March 2018; Accepted 26 June 2018; Published 5 August 2018 Academic Editor: Yakov Strelniker Copyright © 2018 Ben-Bright Benuwa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Dictionary learning (DL) and sparse representation (SR) based classifiers have greatly impacted the classification performance and have had good recognition rate on image data. In video semantic analysis (VSA), the local structure of video data contains more vital discriminative information needed for classification. However, this has not been fully exploited by the current DL based approaches. Besides, similar coding findings are not being realized from video features with the same video category. Based on the issues stated afore, a novel learning algorithm, called sparsity based locality-sensitive discriminative dictionary learning (SLSDDL) for VSA is proposed in this paper. In the proposed algorithm, a discriminant loss function for the category based on sparse coding of the sparse coefficients is introduced into structure of locality-sensitive dictionary learning (LSDL) algorithm. Finally, the sparse coefficients for the testing video feature sample are solved by the optimized method of SLSDDL and the classification result for video semantic is obtained by minimizing the error between the original and reconstructed samples. The experiment results show that the proposed SLSDDL significantly improves the performance of video semantic detection compared with the comparative state-of-the-art approaches. Moreover, the robustness to various diverse environments in video is also demonstrated, which proves the universality of the novel approach. 1. Introduction SR is an active research hotspot, particularly its application for signal reconstruction [1], and it performs well with video analysis and image classification based on experimental outcomes discussed in [2, 3]. DL on the other hand is to learn a good dictionary from training samples so that a given signal could be well represented; hence the quality of the dictionary is very crucial for efficient SR [4]. The dictionary could be determined by either using all the training samples as the dictionary to code the test samples (e.g., locality-constrained linear coding (LLC) in [5]) or adopting a learned dictionary for the sparse representation for each training sample in the set (e.g., KSVD in [6], Fisher discriminative dictionary learning (FDDL) [7]). Besides, group-centered sparse coding likened to rank minimization problem is used to measure the sparse coefficient of each group by estimating the values of each grouping in [8]. All the methods that adopt the first strategy use training samples as the dictionary. Although they show good classification performance the dictionary might not be effective enough to represent the samples well because of noisy information that could have accompanied the original training samples and it may not also wholly exploit the discrimination information hidden in the training samples. The second category is also not suitable for recognition because it only requires that the dictionary is best expressed in the training samples with strict sparse representation. The above stated issues were addressed by the LSDL approach that incorporated a locality constraint into the objective function of the DL which ensures that overcomplete dictionary learned is more representative. The problem with the traditional sparse representation methods is that they cannot produce the same or similar results when the input features are from the same categorization. The elementary approach to building the dictionary is by exhausting all the training samples which may result in the size of the dictionary being huge and is subsequently unfavorable to the sparse solver in [9]. A lot of techniques such as the Method of Optimal Direction (MOD) in [10] which updates all the atoms simultaneously with fixed sparse codes using least-square methods and an orthogonal matching pursuit for sparse coding, the K-SVD algorithm in [11] for learning a handful sized dictionary for sparse coding from the training data have recently been proposed. An attempt was made to resolve this issue by further iteratively updating the KSVD (which focuses on only the representational power or efficiency of the dictionary as discussed in [12], but not its discrimination abilities) trained dictionary grounded on the result of a linear classifier, by obtaining a dictionary that may be good for classification with representational power. Other efforts along similar direction include [13, 14], which use more sophisticated objective functions in dictionary optimization for the training phase so as to gain some discriminative power for the dictionary. The other discriminative dictionary learning method uses discriminative and reconstructive error to learn the dictionary. A structured based Fisher Discriminative Dictionary Learning (FDDL) was proposed in [15]. The approach improves the performance of pattern classification whose dictionary has a relation with the class labels and further exploited the discriminability in both representative coefficient and its residual in [16]. Recently a discriminative structured dictionary learning with hierarchical group sparsity that reduces the linear predictive error and discriminability of sparse codes using hierarchical group sparsity was introduced in [17]. A discriminative structured dictionary learning for image classification that combines into the objective function, the discriminative properties of the reconstruction error, representative error, and the classification error terms was also proposed in [18]. In the same vein of improving the discriminability of dictionary learning algorithms, [19] proposed the structure adaptive dictionary leaning for sparse representation based classification, [20] proposed the learning of dictionary discriminatively for group sparse representation, and discriminative dictionary learning for face recognition with low ranked regularization was also introduced in [21]. The locality-constrained linear coding for image classification (LLC) built on the theories of linear spatial pyramid matching using sparse coding for image classification [22] was proposed in [5]. LLC could achieve a smaller reconstruction error with multiple code reconstruction and local smoothness with respect to sparsity. More recently, there has been an advancement in sparse dictionary learning for video semantic as proposed in [2], a video semantic detection method based on locality-sensitive discriminant sparse representation and weighted KNN (LSDSR-WKNN), to have better category discrimination on the sparse representation of video semantic concepts. Despite having better category discrimination on the SR of video semantic concepts, the LSDSR-WKNN technique failed to fully exploit the potential information of the samples. A latent label consistent DL (LLCDL) that solves a unified framework that minimizes, simultaneously, the structured discrimination sparse codes approximation, the classification error, and the structured sparse reconstruction to boost representation and classification performances was proposed in [23]. The authors of LLCDL algorithm previously introduced the joint label consistence embedded and dictionary learning (JEDL) algorithm [24] based on LC-KSVD. The technique concurrently learns discriminative sparse codes, sparse reconstruction, classification errors, and code approximation that enable linear combination of signals for the sparse code autoextractor with sparse codes during classification. More so, vital similarity information of signals is achieved by enforcing sparse coefficients to be discriminative among different classes. However, it implemented the KSVD computation approach to obtain the dictionary columns which renders it computationally expensive. All the above SR for DL techniques demonstrates an outstanding efficient performance, but are still faced with multiplicity of scene tolerance issues, in enhancing the accuracy of video semantic analysis further. Based on the aforementioned, this paper seeks to enhance the power of discrimination of sparse representation features by encoding group sparse codes of samples from the same category and thus proposes a sparsity based locality-sensitive discriminative dictionary learning for (SLSDDL) for VSA, which is more efficient and has superior numeric stability. The SLSDDL algorithm enables the adjustment of the dictionary adaptively using the differences between the reconstructed dictionary atoms and the training samples with the locality adaptor. More so, the implementation of the locality adaptor at both DL and sparse coding stages increases the efficiency of the algorithm of the locality and similarity of the dictionary atoms. The dictionary information could also be exploited by the proposed algorithm with the introduction of a discriminant loss function during the dictionary learning stage, to obtain more discriminative information and enhance the classification ability of spares representation features. The dictionaries learned could realistically represent video semantic more and thus give a better representation of the samples belonging to the sparse coefficients. The main contributions of this paper are as enumerated as follows:(1)A discriminative loss function is utilized, giving an optimal dictionary for addressing sparse representation issues and optimized sparse coefficients for reconstructing the dictionary, so as to enhance the discriminative classification of sparse representation features.(2)The introduction of the group sparsity also enables efficient reconstruction of samples most especially when the size gets large.(3)A locality-sensitive discriminative dictionary learning and sparse representation based on group sparsity is developed for video semantic detection that encodes video features originally. Thus features are sparsely encoded with video semantic detection for better preservation of the dictionary, hence an improvement in the accuracy of video concept detection. The rest of the paper is organized into the following sections: Section 2 presents a review of related works; Section 3 discusses the proposed algorithms; experimental results are presented in Section 4. Finally, Section 5 outlines the main conclusions and recommendations. 2. Related Works2.1. Sparse Representation Algorithms SR for signal acquirement and the overall techniques included coding, sampling, compression, transmission, and decoding and is one of the utmost essential ideologies in the field of signal processing theorem which tells us that a signal that has been sampled can be impeccably reconstructed from an arrangement of samples if the sampling rate surpasses twice the maximum frequency of the original signal [25]. It utilizes an overcomplete dictionary to linearly reconstruct a data case for a given dictionary. Suppose that the example is from space , and thus all the examples concatenated to form a matrix, denoted as . If any sample can be approximately represented by a linear amalgamation of dictionary and the number of the samples is larger than the dimension of samples in D, i.e., m > d, dictionary is referred to as an overcomplete dictionary. A signal is said to be compressible if it is a sparse signal in the original or transformed domain when there is no information or energy loss during the course of transformation. Generally, from the matrix stated above, if d<m, and y then we define the linear system of equations asThe sparsest representation solution can be acquired by solving (1) with the -norm minimization constraint [26]. Thus problem (1) can be converted to the following optimization problem:Problem (2) is called the -sparse approximation problem. Because real data always contains noise, representation noise is unavoidable in most cases. Thus the original model (1) can be revised to a modified model with respect to small possible noise by denotingwhere refers to representation noise and is bounded as . With the presence of noise, the sparse solutions of problems (2) can be approximately obtained by resolving the following optimization problems:OrFurthermore, according to the Lagrange multiplier theorem, a proper constant exists such that (4) and (5) are equivalent to the following unconstrained minimization problem with a proper value of .where refers to the Lagrange multiplier associated with . 2.2. Locality-Sensitive Sparse Representation Test examples in SR are usually represented by the dictionary atoms that may actually not be neighboring it for signal or data reconstructions. This may render it incongruous for holding data locality and hence could lead to poor recognition rates. In the LSDL method, the locality constraints of data locality were added in the phase of the dictionary learning and sparse coding, so that the test sample can be well represented by the neighboring dictionary atoms. The optimized function for LSDL is stated as follows:where is locality adapter and is the scalar parameters which measures the locality constraint. The LSDL method does not consider the class information, and therefore we design the discriminating loss function using class information to enhance the sparse representation based classification, in order to improve the accuracy of image data representation or video analysis. 3. Proposed Method The proposed algorithm for SLSDDL is based on the locality-sensitive adaptor, and a discriminative lost function centered on group SR is incorporating into the objective function of locality-sensitive discriminative dictionary learning method as stated in (7) and detailed as explained below. The proposed method could achieve optimal dictionary and further enhance the power of discriminability of sparse codes as well as computational cost. As a result, features from representation coefficient might not be capable of achieving the same coding results; hence the assumption is made that samples should be encoded as similar sparse coefficients in the SR based data detection with the objective of enhancing the power of discrimination in SR based classifiers. Assume that denotes n training samples from k classes where the column vector is the sample , and submatrix consists of column vectors from class If there are M atoms in the dictionary , an overcomplete dictionary for SR with , then coding coefficients of over could be denoted by . Now considering image or video features of the same category of the representation coefficient or the representation solution with class information, which is very imperative for classification, our SLSDDL framework is modeled as follows:where we have the locality adaptor as , is the training sample, is the dictionary, is the sparse coefficient vector, and the discriminative lost function for group sparsity is . Here the locality adaptor uses the l2-norm function and its kth element in is given by and symbol represents element-wise multiplication, is the proposed discriminant loss function to enforce group sparsity discriminability as defined in (11) below, the shift variant constant enforces the coding result of x to remain the same although the origin of the data coordinate may be shifted as indicated in [27], and the regularization parameter controlling the reconstruction error and the sparsity is . Bearing in mind cases where sample features from the same category could be having similar sparse codes, we propose a discriminant loss function based on a group sparse coding of the sparse coefficients for the purposes of enhancing the power of discrimination of input signals or samples in SR. The group structure enforced on the coefficient vector ensures that the variables in the same group tend to be either zero or nonzero simultaneously and thus enhances classification, hence better classification results. Consequently, samples from same category are compacted and the ones from different categories are separated. The discriminant loss function, , based on sparse coefficients is explained as follows:where is the within class-similar term enforcing group sparsity by computing the representations whose nonzero coefficients are further divided into groups, is the number of samples of the representation coefficient belonging to the class , and is the number of training samples. The term combined with could make (9) more stable based on theorem of [28]. It is worth noting that group sparsity is enforced in the way is calculated. The encoding of using the proposed method significantly is computationally less expensive, compared with the existing SR classification approaches as a result of the smaller dictionary size. Thus is much more easier to obtain than , and it enforces the capture of dependencies among video samples of the same categorization. With set to 1 for simplicity, (9) can be reformulated as The proposed method as stated equation (12) is implemented by enforcing data locality with the locality adaptor being used to measure the distance between the test sample and each column of . Note that represents all the training samples in the feature space. The vector P, the dissimilarity vector in (12) is implemented to suppress the corresponding weight and also penalizes the distance between the test and each training samples in the feature space. Furthermore, it should however be made known that the resulting coefficients in our SLSDDL formulation may not be fully sparse with regard to l2 – norm, but is seen as sparse because the representation solutions only have few significant values with most being zero. The test samples and their neighboring training samples in feature space are encoded when the problem of (11) is being minimized and the resulting coefficients X still are sparse because as gets large, shrink to be zero. Hence most coefficients get zero with just some few having significant values. 3.1. Optimization The objective function in (12) could be divided into two subproblems by updating X with D fixed and updating D with X fixed since it is not convex as a whole. To find the desired dictionary D and the sparse coefficient X, an alternative optimization is implemented iteratively adopting the theories of [15, 23, 24, 29]. Updating X with D Fixed, Sparse Coding Stage. When the coefficient matrix X is being updated with the dictionary D being fixed, the objection function of (12) is reduced to the coding problem below:X in (15) could be addressed on class bases. Thus corresponding to class could be derived as follows:Equation (15) can be rewritten by adopting Lagrange function as follows: Equation (18) can be converted as follows:where , is a diagonal matrix whose nonzero elements are the square of the entries of , . Let . We have where ; once (20) is premultiplied by , we get . Then through substituting into (19), the analytical solution of (17) is obtained asFurthermore, the normalized of is given as Updating with the Sparse Coefficient Matrix Fixed, the Dictionary Updates Stage. During the dictionary update stage, the objective function in (12) is reduced to (22) as stated below: Letting , we take the partial derivatives of with respect to for , which gives Letting , and , we obtain , where the matrices and are as detailed below: The optimal X and D could be gotten by alternating between sparse coding and dictionary learning until an optimal iteration, or error of reconstruction less than a threshold value is gotten. The algorithm for SLSDDL is summarized in Algorithm 1. As an extension of the LSDL method, the proposed SLSDDL method seeks to obtain discriminative information of all training samples. Therefore, the LSDL method is espoused to initialize the sparse coefficients of all the training samples. Algorithm 1: Sparsity Locality–Sensitive Discriminative Dictionary Learning Algorithm. 3.2. Classification Scheme For a given test sample y, the representation coefficients on dictionary D are given as follows:The test sample can finally be classified after obtaining D, where is a scalar constant that measures sparsity, represents element-wise multiplicative symbol, , and every entry is represented . We utilized the Lagrange multiplier to obtain the solution for (28) as explained below:Equation (27) is easily simplified as where Let , which gives uswhere ; once (29) is premultiplied by , we get . Then through substituting into (30), the analytical solution of (30), , is obtained asSparse coding coefficient is the then obtained by normalizing and is determined asUsing the analysis in [30], we get the sparse coding efficient associated with test sample y by solving (30) and (31), after which the residual is calculated, where is the sparse coding coefficient on the ith class and the test sample is finally classified into the class that has the minimum residual as formulated below: 4. Experimental Results and Analysis This section details the experimental results and their analysis to demonstrate the effectiveness of the proposed method. 4.1. Video Shot Preprocessing In the experiments, ordered samples clustering centered on artificial immune in [31] was adopted to extract the static image frames from each original video data utilizing the video key-frame extraction approach. Afterwards, features consisting of the 5-dimensional radial Tchebichef moment [32], 6-dimensional gray level cooccurrence matrix (GLCM) [33], 81-dimensional HSV color histogram [34], and 30-dimensional multiscale local binary pattern (LBP) [35] from the key-frame were extracted. The details of these features could be referred to in [2]. Figure 1 shows some key-frames of the three video datasets implemented. Figure 1: Some key frames from video shots for the respective datasets. 4.2. Database Selection and Video Analysis Experiments are performed on natural images to demonstrate the efficient performance of our methods in this section. We analyze and compare the performance of our proposed algorithm with most typical algorithms such as the KSVD [11], FDDL, LLC [5], and LSDL [3] algorithms. Out of the many collections of datasets for video categorization and classification, three of them are being used in our experimental setup or evaluations. The public datasets being used are the TRECVID 2012 video dataset, OV video dataset, and YouTube video dataset. The TRECVID 2012 videos dataset has airplane, baby, building, car, dog, flower, instrumental_musician, mountain, scene_text, and speech as its video semantic models with each class containing 60 data sample; 50 of the samples were randomly as training samples, and the remaining as test samples. The YouTube video dataset is comprised of basketball, biking, diving, golf_swing, horse_riding, soccer_juggling, swing, tennis, trampoline_jumping, volleyball, and walking as semantic videos with each class containing 70 data samples, out of which 60 were randomly selected as training samples, and the rest as test samples, and the semantic concepts of the OV dataset contain aircraft, sea, rocket, parachute, road, face, satellite, and star. Each class consists of 70 samples. For each class, 60 samples are randomly selected as training samples, and the remaining ones are testing samples. The proposed approach considered the L2 norm locality adaptor. The experimental results were realized by twentyfold cross-validation in which training samples are selected randomly and the remaining as the testing samples from the various video datasets. 4.3. Parameter Selection Several parameters including , the classification parameter, , a positive weighing parameter for the locality-sensitive constraint, and for the discriminative constraint were utilized by the proposed SLSDDL and the classification error scheme, with varied values assigned depending on the dataset being used. More so, the classification parameter, , made a little or no effect on the output of the experimental results and hence was set to 0.01 in all experiments. The recognition rates of the proposed SLSDDL approach (for the optimization phase) with varying values of and on TRECVID dataset are as depicted by Figure 2. It could be seen from the figure that the optimum results were obtained for recognition when is 0.01 and is 0.007, respectively, with and values being fixed for each dataset by twentyfold cross validation. It is worth noting that different parameter values were realized for the experiment on the respective datasets because of the varying fundamental structure of the datasets. Based on the values stated in Table 1, an experiment was conducted to validate the performance of our proposed. Table 1: Parameter values implemented on the various data sets. Figure 2: Recognition rate Vrs Variations of parameter values on TRECVID Dataset. 4.4. Experimental Results and Analysis: TRECVID 2012 Video Dataset The recognition rates of video semantic detection of the proposed SLSDDL with varying video features for the initial dictionaries are as indicated in Table 2. It could be seen that the SLSDDL gives an optimum performance when the size of the dictionary is 122 x 400 and thus resulted in the highest accuracy of semantic analysis and hence is chosen. Table 2: Classification results on different dictionary atoms of TRECVID 2012. The recognition rates of LLC, FDDL, LSDL, KSVD, and SLSDDL are as shown in Table 3. It could be seen that the proposed SLSDDL comparatively obtained the highest recognition rates than the other recognition approaches. As indicated by Figure 3, the average recognition rates of various video semantic detection methods associated with each class on the TRECVID video dataset are also presented, which clearly suggests that the proposed method is best among all the comparative approaches on most categories. The recognition rate of the proposed method on the TRECVID dataset for instance outperformed KSVD by 8.23% and 7.86% against LSDL which confirms the effectiveness of the proposed method. Hence, the proposed SLSDDL approach could effectively improve best the accuracy rates of TRECVID video semantic feature detections. Nevertheless, the recognition rates for the proposed SLSDDL are low for Instrument and Mountain as compared to that of the KSVD. This is because KSVD performs better in terms of semantic detection for Instrument and Mountain video categories for TRECVID 2012 dataset. Table 3: The recognition rate of different video semantic analysis algorithms of TRECVID 2012. Figure 3: Recognition rate of TRECVID 2012 videos with different algorithms. 4.5. Experimental Results and Analysis: OV Video Dataset The recognition rates of video semantic detection of the proposed SLSDDL with varying video features for the initial dictionaries are as indicated in Table 4. It could be seen that the SLSDDL gives an optimum performance when the size of the dictionary is 122 x 500 and thus resulted in the highest accuracy of semantic analysis and hence is chosen. Table 4: Classification results on different dictionary atoms of OV dataset. The recognition rates of LLC, FDDL, LSDL, KSVD, and SLSDDL are as shown in Table 5. It could be seen that the proposed SLSDDL comparatively obtained the highest recognition rates than the other recognition approaches. As indicated by Figure 4, the average recognition rates of various video semantic detection methods associated with each class on the OV video dataset is also presented, which clearly suggests that the proposed method is best among all the comparative approaches on most categories. The recognition rate of the proposed method on the OV dataset, for instance, outperformed KSVD by 10.86% and 10.41% against LSDL, which goes to confirm the effectiveness of the proposed method. Hence, the proposed SLSDDL approach could effectively improve best the accuracy rates of OV video semantic feature detections. Table 5: The recognition rate of different video semantic analysis algorithms of OV dataset. Figure 4: Recognition rate of OV videos with different algorithms. 4.6. Experimental Results and Analysis: YouTube Video Dataset The recognition rates of video semantic detection of the proposed SLSDDL with varying video features for the initial dictionaries are as indicated in Table 6. It could be seen that the SLSDDL gives an optimum performance when the size of the dictionary is 122 x 550 and thus resulted in the highest accuracy of semantic analysis and hence is chosen. Table 6: Classification results on different dictionary atoms of YouTube. The recognition rates of LLC, FDDL, LSDL, KSVD, and SLSDDL are as shown in Table 7. It could be seen that the proposed SLSDDL comparatively obtained the highest recognition rates than the other recognition approaches. As indicated by Figure 5, the average recognition rates of various video semantic detection methods associated with each class on the YouTube video dataset are also presented, which clearly suggests that the proposed method is best among all the comparative approaches on most categories. The recognition rate of the proposed method on the YouTube dataset also outperformed KSVD by 8.9% and 9.36% against LSDL which confirms the effectiveness of the proposed method. Hence, the proposed SLSDDL approach could effectively improve best the accuracy rates of YouTube video semantic feature detections. Table 7: The recognition rate of different video semantic analysis algorithms of YouTube. Figure 5: Recognition rate of YouTube videos with different algorithms. 5. Conclusion and Recommendations In this paper, a locality-sensitive discriminative dictionary learning based on sparsity is proposed. The proposed approach learns that video samples from the same category should be encoded as similar sparse coefficients in the process of video semantic detection based on sparse representation, so as to enhance the power of discrimination of sparse representation features. The paper also introduced into the structure of the locality-sensitive algorithm, a lost function centered on sparse coefficients to optimize the dictionary. The SLSDDL enhances the power of discrimination of sparse representation features based on the principles of Fishers for better dictionary optimization as a result of the constraints and also to improve video semantic classification. Based on the experimental results, it could be seen that the proposed SLSDDL algorithm for video semantic analysis is more effective and outperforms the other state-of-the-art approaches such as LLC, FDDL, LSDL, and KSVD. Despite superior results demonstrated by the proposed SLSDDL on the TRECVID, YouTube, and OV video datasets, there is still the need to improve the execution time and further improve the power of discrimination; hence we plan on using deep learning discussed in [36] for the extraction of the video features and also introducing kernel into the structure in our future work. Data Availability The data used to support the findings of this study are available from the corresponding author upon request. Conflicts of Interest The authors declare no conflicts of interest whatsoever. Acknowledgments This work was supported in part by National Natural Science Foundation of China (Grants nos. ~61170126 and ~61502208), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant no. 14KJB520007), China Postdoctoral Science Foundation (Grant no. 2015M570411), Natural Science Foundation of Jiangsu Province of China (Grant no. BK20150522), and Research Foundation for Talented Scholars of Jiangsu University (Grant no. 14JDG037). References Z. Zhang, Y. Xu, J. Yang, X. Li, and D. Zhang, “A survey of sparse representation: algorithms and applications,” IEEE Access, vol. 3, pp. 490–530, 2015. View at Publisher · View at Google ScholarY. Zhan, J. Liu, J. Gou, and M. Wang, “A video semantic detection method based on locality-sensitive discriminant sparse representation and weighted KNN,” Journal of Visual Communication and Image Representation, vol. 41, pp. 65–73, 2016. View at Publisher · View at Google Scholar · View at ScopusC.-P. Wei, Y.-W. Chao, Y.-R. Yeh, and Y.-C. F. Wang, “Locality-sensitive dictionary learning for sparse representation based classification,” Pattern Recognition, vol. 46, no. 5, pp. 1277–1287, 2013. View at Publisher · View at Google Scholar · View at ScopusF. Meng, X. Yang, C. Zhou, and Z. Li, “A sparse dictionary learning-based adaptive patch inpainting method for thick clouds removal from high-spatial resolution remote sensing imagery,” Sensors, vol. 17, no. 9, 2017. View at Google Scholar · View at ScopusJ. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 3360–3367, IEEE, June 2010. View at Publisher · View at Google Scholar · View at ScopusQ. Zhang and B. Li, “Discriminative K-SVD for dictionary learning in face recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 2691–2698, June 2010. View at Publisher · View at Google Scholar · View at ScopusH. Zheng and D. Tao, “Discriminative dictionary learning via Fisher discrimination K-SVD algorithm,” Neurocomputing, vol. 162, pp. 9–15, 2015. View at Publisher · View at Google Scholar · View at ScopusZ. Zha et al., Analyzing the group sparsity based on the rank minimization methods, 2016. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009. View at Publisher · View at Google Scholar · View at ScopusK. Engan, S. O. Aase, and J. H. Hakon-Husoy, “Method of optimal directions for frame design,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), vol. 5, pp. 2443–2446, Phoenix, Ariz, USA, March 1999. View at Publisher · View at Google Scholar · View at ScopusM. Aharon, M. Elad, and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006. View at Publisher · View at Google Scholar · View at ScopusD.-S. Pham and S. Venkatesh, “Joint learning and dictionary construction for pattern recognition,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, USA, June 2008. View at ScopusJ. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Discriminative learned dictionaries for local image analysis,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008. View at Publisher · View at Google Scholar · View at ScopusJ. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Supervised dictionary learning,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09), pp. 1033–1040, Vancouver, Canada, December 2009. View at ScopusM. Yang, L. Zhang, X. C. Feng, and D. Zhang, “Fisher discrimination dictionary learning for sparse representation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 543–550, Barcelona, Spain, November 2011. View at Publisher · View at Google Scholar · View at ScopusM. Yang, L. Zhang, X. Feng, and D. Zhang, “Sparse representation based Fisher discrimination dictionary learning for image classification,” International Journal of Computer Vision, vol. 109, no. 3, pp. 209–232, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusY. Xu, Y. Sun, Y. Quan, and B. Zheng, “Discriminative structured dictionary learning with hierarchical group sparsity,” Computer Vision and Image Understanding, vol. 136, pp. 59–68, 2015. View at Publisher · View at Google Scholar · View at ScopusP. Wang, J. Lan, Y. Zang, and Z. Song, “Discriminative structured dictionary learning for image classification,” Transactions of Tianjin University, vol. 22, no. 2, pp. 158–163, 2016. View at Publisher · View at Google Scholar · View at ScopusH. Chang, M. Yang, and J. Yang, “Learning a structure adaptive dictionary for sparse representation based classification,” Neurocomputing, vol. 190, pp. 124–131, 2016. View at Publisher · View at Google Scholar · View at ScopusY. Sun, Q. Liu, J. Tang, and D. Tao, “Learning discriminative dictionary for group sparse representation,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3816–3828, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusL. Li, S. Li, and Y. Fu, “Discriminative dictionary learning with low-rank regularization for face recognition,” in Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, China, April 2013. View at ScopusJ. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 1794–1801, June 2009. View at Publisher · View at Google Scholar · View at ScopusZ. L. Jiang, Z. Lin, and L. S. Davis, “Learning a discriminative dictionary for sparse coding via label consistent K-SVD,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 1697–1704, IEEE, Providence, RI, USA, June 2011. View at Publisher · View at Google Scholar · View at ScopusS. Cai, W. Zuo, L. Zhang, X. Feng, and P. Wang, “Support Vector Guided Dictionary Learning,” in Computer Vision – ECCV 2014, vol. 8692 of Lecture Notes in Computer Science, pp. 624–639, Springer International Publishing, Cham, 2014. View at Publisher · View at Google ScholarB.-B. Benuwa, B. Ghansah, D. K. Wornyo, and S. A. Adabunu, “A comprehensive review of Particle swarm optimization,” International Journal of Engineering Research in Africa, vol. 23, pp. 141–161, 2016. View at Publisher · View at Google Scholar · View at ScopusD. L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via minimization,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 100, no. 5, pp. 2197–2202, 2003. View at Publisher · View at Google Scholar · View at MathSciNetK. Yu, T. Zhang, and Y. Gong, “Nonlinear learning using local coordinate coding,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009, pp. 2223–2231, Canada, December 2009. View at ScopusH. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society B: Statistical Methodology, vol. 67, no. 2, pp. 301–320, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at ScopusZ. Feng, M. Yang, L. Zhang, Y. Liu, and D. Zhang, “Joint discriminative dimensionality reduction and dictionary learning for face recognition,” Pattern Recognition, vol. 46, no. 8, pp. 2134–2143, 2013. View at Publisher · View at Google Scholar · View at ScopusM. Harandi and M. Salzmann, “Riemannian coding and dictionary learning: Kernels to the rescue,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 3926–3935, USA, June 2015. View at ScopusY. Zhan, M. Wang, and J. Ke, “Video key-frame extraction using ordered samples clustering based on artificial immune,” Jiangsu Daxue Xuebao (Ziran Kexue Ban)/Journal of Jiangsu University (Natural Science Edition), vol. 33, no. 2, pp. 199–204, 2012. View at Google Scholar · View at ScopusR. Mukundan, “Radial tchebichef invariants for pattern recognition,” in Proceedings of the TENCON 2005 - 2005 IEEE Region 10 Conference, Australia, November 2005. View at ScopusR. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 3, no. 6, pp. 610–621, 1973. View at Publisher · View at Google Scholar · View at ScopusM. Kim, “Efficient histogram dictionary learning for text/image modeling and classification,” Data Mining and Knowledge Discovery, vol. 31, no. 1, pp. 203–232, 2017. View at Publisher · View at Google Scholar · View at ScopusT. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002. View at Publisher · View at Google Scholar · View at ScopusB.-B. Benuwa, Y. Zhan, B. Ghansah, D. K. Wornyo, and F. B. Kataka, “A review of deep machine learning,” International Journal of Engineering Research in Africa, vol. 24, pp. 124–136, 2016. View at Publisher · View at Google Scholar · View at Scopus


This is a preview of a remote PDF: http://downloads.hindawi.com/journals/mpe/2018/9312563.pdf

Ben-Bright Benuwa, Yongzhao Zhan, Benjamin Ghansah, Ernest K. Ansah, Andriana Sarkodie. Sparsity Based Locality-Sensitive Discriminative Dictionary Learning for Video Semantic Analysis, Mathematical Problems in Engineering, 2018, DOI: 10.1155/2018/9312563