Cross domain meta-network for sketch face recognition (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.matec-conferences.org/articles/matecconf/pdf/2021/05/matecconf_cscns20_06007.pdf

Cross domain meta-network for sketch face recognition

MATEC Web of Conferences 336, 06007 (2021) CSCNS2020 https://doi.org/10.1051/matecconf/202133606007 Cross domain meta-network for sketch face recognition Yuying Shao1, Lin Cao1,* , Changwu Chen1, and Kangning Du1 1 School of Information and Communication Engineering, Beijing Information Science and Technology University, China Abstract. Because of the large modal difference between sketch image and optical image, and the problem that traditional deep learning methods are easy to overfit in the case of a small amount of training data, the Cross Domain Meta-Network for sketch face recognition method is proposed. This method first designs a meta-learning training strategy to solve the small sample problem, and then proposes entropy average loss and cross domain adaptive loss to reduce the modal difference between the sketch domain and the optical domain. The experimental results on UoM-SGFS and PRIP-VSGC sketch face data sets show that this method and other sketch face recognition methods. 1 Introduction Sketch face recognition[1] is the process of matching face photos with sketch images. Because of its important role in criminal investigations, it has received great attention in recent years[2]. To solve the problem of modal differences, the algorithms currently designed for sketch face recognition can be divided into traditional sketch face recognition methods and deep learning-based methods. The traditional sketch face recognition methods can be roughly divided into three categories[3]: feature-based methods, the method based on synthesis and the method based on common subspace projection. Feature-based methods aim to represent face images with local feature descriptors. However, this type of method usually results in the loss of details in the face image. The synthesis-based method solves the problem of face photos (photos) by converting photos (sketches) into sketches (photos), and then using the facial recognizer to match the synthesized sketches (photos) with the original sketches (photos). However sketching face synthesis[4] is more challenging. The method based on subspace projection aims to project facial images of different modalities into a common subspace to reduce the influence of modal differences on recognition performance. However, this type of method may lose important information in the original image. To solve the above problems, a Cross Domain Meta-Network for sketch face recognition is proposed here. In addition, the entropy mean loss and cross domain adaptive loss are proposed to reduce the modal difference between the sketch domain and the optical domain and improve the recognition ability of the model. * Corresponding author: © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). MATEC Web of Conferences 336, 06007 (2021) CSCNS2020 https://doi.org/10.1051/matecconf/202133606007 2 Cross domain meta-network In this section, the cross domain adaptive entropy element network is introduced in detail. Figure 1 shows the training process of a single training episode and a single test episode. Embedded space Training episode Photo features Feature extractor 1 1 2 2 1 1 N-1 N-1 1 N 1 N N 2 N 1 N 2 Sketch features N N 1 N N-1 1 N Classifier N 1 N-1 Discriminator Testing episode Fig. 1. The training process of a single training episode and a single test episode. First, the feature extractor extracts the features in the training episode, and then learns the metric relationship between the two in the embedding space, and finally classifies the image through the meta-classifier and calculates the entropy mean loss. 2.1 Mean entropy loss D = train Randomly sample K<N classes from the training set {( )( ) ( )( p p s s I , y , I , y ,..., I , y , I ,y 1 1 1 1 N N N N )} with N classes to form a training episode, where I and I are photos and sketches respectively, and y is the corresponding label. In each training episode, all images p constitute the query set support set S = s {( ) ( S p = {( s s I , y ,..., I , y 1 1 K K episode E p Q= ) ( {( )( ) ( )( p p s s I , y , I , y ,..., I , y , I , y 1 1 1 1 K K K K p p I , y ,..., I , y 1 1 K K s )} , all the photos constitute the photo )} and all the sketches constitute the sketch support set )} . In the test set D train , randomly sampled K photos to form a photo test and K sketches to form a sketch episode E s . Use f ( ⋅) to represent the embedding function implemented by the feature extractor. Given a query set image I , the c corresponding feature vector is I p ( i ∈ 1,..., K ) i ( c ) . The Euclidean distances between it and the photo f I in the photo support set S p and the sketch Iis ( i ∈ 1,..., K ) in the sketch support set S are: s = d cip where , d cis f ( I c ) − f ( I ip ) = 2 f ( I c ) − f ( I is ) 2 ⋅ 2 is the Euclidean distance. Therefore, the probabilities in the support set are: 2 (1) MATEC Web of Conferences 336, 06007 (2021) CSCNS2020 https://doi.org/10.1051/matecconf/202133606007 �1 � p ( y = yi | I c ) = exp � ( d cip + d cis ) � �2 � ∑ K j =1 �1 � exp � ( d cip + d cis ) � �2 � The mean entropy loss is obtained: � 1 K � 1 �� = − ln � M cj + ln ∑ j =1 exp � − ( d cip + d cis ) � � Lmean ∑ Q ( I j , y j )∈Q � 2 �� (2) (3) 2.2 Cross domain adaptive loss Aiming at the data offset between the training data and the test data, a co-modal adversarial loss is proposed: Ε[log Ddom ( S p ) + log(1 − Ddom ( E p ))]+ Lcm adv = Ε[log Ddom ( E p ) + log(1 − Ddom ( S p ))] + Ε[log Ddom ( S s ) + log(1 − Ddom ( Es ))] + (4) Ε[log Ddom ( Es ) + log(1 − Ddom ( S s ))] where Ddom ( ⋅) is the discriminator.Aiming at the modal difference between the sketch and the photo, a cross modal adversarial loss is proposed: Ε[log Ddom ( S p ) + log(1 − Ddom ( Es ))]+ Lcm adv = Ε[log Ddom ( Es ) + log(1 − Ddom ( S p ))] + Ε[log Ddom ( S s ) + log(1 − Ddom ( E p ))] + (5) Ε[log Ddom ( E p ) + log(1 − Ddom ( S s ))] Combining formula (4) and formula (5), the final cross domain adaptive loss is obtained: cm (6) Lmda = Lsm adv +Ladv The optimization goal of this article is: (7) min L = Lmean +α Lmda where α is the balance hyperparameter between the entropy mean loss and the cross domain adaptive loss. 3 Experiments 3.1 Implementation details This paper uses UoM-SGFS and PRIP-VSGC datasets to evaluate the effectiveness of this method. This article sets up two data sets based on the above two data sets. In the first data set (S1), 450 pairs of sketch-photos as the training set, and 150 pairs of sketch-photos are used as the test set. In the second data set (S2), the data set setting is the same as S1. This paper uses the MTCNN method to locate and align the images. Before the training process, all images are cropped to a standard size of 256×256. In the training process, the featu (...truncated)