Cross domain meta-network for sketch face recognition
MATEC Web of Conferences 336, 06007 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606007
Cross domain meta-network for sketch face
recognition
Yuying Shao1, Lin Cao1,* , Changwu Chen1, and Kangning Du1
1 School of Information and Communication Engineering, Beijing Information Science and
Technology University, China
Abstract. Because of the large modal difference between sketch image
and optical image, and the problem that traditional deep learning methods
are easy to overfit in the case of a small amount of training data, the Cross
Domain Meta-Network for sketch face recognition method is proposed.
This method first designs a meta-learning training strategy to solve the
small sample problem, and then proposes entropy average loss and cross
domain adaptive loss to reduce the modal difference between the sketch
domain and the optical domain. The experimental results on UoM-SGFS
and PRIP-VSGC sketch face data sets show that this method and other
sketch face recognition methods.
1 Introduction
Sketch face recognition[1] is the process of matching face photos with sketch images.
Because of its important role in criminal investigations, it has received great attention in
recent years[2].
To solve the problem of modal differences, the algorithms currently designed for sketch
face recognition can be divided into traditional sketch face recognition methods and deep
learning-based methods. The traditional sketch face recognition methods can be roughly
divided into three categories[3]: feature-based methods, the method based on synthesis and
the method based on common subspace projection. Feature-based methods aim to represent
face images with local feature descriptors. However, this type of method usually results in
the loss of details in the face image. The synthesis-based method solves the problem of face
photos (photos) by converting photos (sketches) into sketches (photos), and then using the
facial recognizer to match the synthesized sketches (photos) with the original sketches
(photos). However sketching face synthesis[4] is more challenging. The method based on
subspace projection aims to project facial images of different modalities into a common
subspace to reduce the influence of modal differences on recognition performance.
However, this type of method may lose important information in the original image.
To solve the above problems, a Cross Domain Meta-Network for sketch face
recognition is proposed here. In addition, the entropy mean loss and cross domain adaptive
loss are proposed to reduce the modal difference between the sketch domain and the optical
domain and improve the recognition ability of the model.
*
Corresponding author:
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 336, 06007 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606007
2 Cross domain meta-network
In this section, the cross domain adaptive entropy element network is introduced in detail.
Figure 1 shows the training process of a single training episode and a single test episode.
Embedded
space
Training episode
Photo
features
Feature extractor
1
1
2
2
1
1
N-1
N-1
1
N
1
N
N
2
N
1
N
2
Sketch
features
N
N
1
N
N-1
1
N
Classifier
N
1
N-1
Discriminator
Testing episode
Fig. 1. The training process of a single training episode and a single test episode. First, the feature
extractor extracts the features in the training episode, and then learns the metric relationship between
the two in the embedding space, and finally classifies the image through the meta-classifier and
calculates the entropy mean loss.
2.1 Mean entropy loss
D
=
train
Randomly sample K<N classes from the training set
{(
)(
) (
)(
p
p
s
s
I , y , I , y ,..., I , y
, I ,y
1 1
1 1
N N
N N
)}
with N classes to form a training episode, where I and I are photos and sketches
respectively, and y is the corresponding label. In each training episode, all images
p
constitute the query set
support set
S =
s
{(
) (
S
p
=
{(
s
s
I , y ,..., I , y
1 1
K K
episode
E
p
Q=
) (
{(
)( ) (
)(
p
p
s
s
I , y , I , y ,..., I , y , I , y
1 1
1 1
K K
K K
p
p
I , y ,..., I , y
1 1
K K
s
)} , all the photos constitute the photo
)} and all the sketches constitute the sketch support set
)} . In the test set
D
train
, randomly sampled K photos to form a photo test
and K sketches to form a sketch episode
E
s
. Use f ( ⋅) to represent the embedding
function implemented by the feature extractor. Given a query set image I , the
c
corresponding feature vector is
I
p
( i ∈ 1,..., K )
i
( c ) . The Euclidean distances between it and the photo
f I
in the photo support set
S
p
and the sketch Iis ( i ∈ 1,..., K ) in the sketch support
set S are:
s
=
d cip
where
, d cis
f ( I c ) − f ( I ip ) =
2
f ( I c ) − f ( I is )
2
⋅ 2 is the Euclidean distance. Therefore, the probabilities in the support set are:
2
(1)
MATEC Web of Conferences 336, 06007 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606007
�1
�
p ( y = yi | I c ) = exp � ( d cip + d cis ) �
�2
�
∑
K
j =1
�1
�
exp � ( d cip + d cis ) �
�2
�
The mean entropy loss is obtained:
�
1
K
� 1
��
=
− ln � M cj + ln ∑ j =1 exp � − ( d cip + d cis ) � �
Lmean
∑
Q ( I j , y j )∈Q
� 2
��
�
(2)
(3)
2.2 Cross domain adaptive loss
Aiming at the data offset between the training data and the test data, a co-modal adversarial
loss is proposed:
Ε[log Ddom ( S p ) + log(1 − Ddom ( E p ))]+
Lcm
adv =
Ε[log Ddom ( E p ) + log(1 − Ddom ( S p ))] +
Ε[log Ddom ( S s ) + log(1 − Ddom ( Es ))] +
(4)
Ε[log Ddom ( Es ) + log(1 − Ddom ( S s ))]
where Ddom ( ⋅) is the discriminator.Aiming at the modal difference between the sketch and
the photo, a cross modal adversarial loss is proposed:
Ε[log Ddom ( S p ) + log(1 − Ddom ( Es ))]+
Lcm
adv =
Ε[log Ddom ( Es ) + log(1 − Ddom ( S p ))] +
Ε[log Ddom ( S s ) + log(1 − Ddom ( E p ))] +
(5)
Ε[log Ddom ( E p ) + log(1 − Ddom ( S s ))]
Combining formula (4) and formula (5), the final cross domain adaptive loss is obtained:
cm
(6)
Lmda = Lsm
adv +Ladv
The optimization goal of this article is:
(7)
min L = Lmean +α Lmda
where α is the balance hyperparameter between the entropy mean loss and the cross
domain adaptive loss.
3 Experiments
3.1 Implementation details
This paper uses UoM-SGFS and PRIP-VSGC datasets to evaluate the effectiveness of this
method. This article sets up two data sets based on the above two data sets. In the first data
set (S1), 450 pairs of sketch-photos as the training set, and 150 pairs of sketch-photos are
used as the test set. In the second data set (S2), the data set setting is the same as S1. This
paper uses the MTCNN method to locate and align the images. Before the training process,
all images are cropped to a standard size of 256×256. In the training process, the featu (...truncated)