CRPGAN: Learning image-to-image translation of two unpaired images by cross-attention mechanism and parallelization strategy
PLOS ONE
RESEARCH ARTICLE
CRPGAN: Learning image-to-image translation
of two unpaired images by cross-attention
mechanism and parallelization strategy
Long Feng ID, Guohua Geng ID*, Qihang Li, Yi Jiang, Zhan Li, Kang Li
School of Information Science and Technology, Northwest University, Xi’an, China
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Feng L, Geng G, Li Q, Jiang Y, Li Z, Li K
(2023) CRPGAN: Learning image-to-image
translation of two unpaired images by crossattention mechanism and parallelization strategy.
PLoS ONE 18(1): e0280073. https://doi.org/
10.1371/journal.pone.0280073
Editor: Xiangjie Kong, Zhejiang University of
Technology, CHINA
Received: September 27, 2022
Abstract
Unsupervised image-to-image translation (UI2I) tasks aim to find a mapping between the
source and the target domains from unpaired training data. Previous methods can not effectively capture the differences between the source and the target domain on different scales
and often leads to poor quality of the generated images, noise, distortion, and other conditions that do not match human vision perception, and has high time complexity. To address
this problem, we propose a multi-scale training structure and a progressive growth generator method to solve UI2I task. Our method refines the generated images from global structures to local details by adding new convolution blocks continuously and shares the network
parameters in different scales and also in the same scale of network. Finally, we propose a
new Cross-CBAM mechanism (CRCBAM), which uses a multi-layer spatial attention and
channel attention cross structure to generate more refined style images. Experiments on
our collected Opera Face, and other open datasets Summer$Winter, Horse$Zebra,
Photo$Van Gogh, show that the proposed algorithm is superior to other state-of-art
algorithms.
Accepted: December 20, 2022
Published: January 6, 2023
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0280073
Copyright: © 2023 Feng et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The Summer2Winter
datasets can be obtained on website (https://
people.eecs.berkeley.edu/�taesung_park/
CycleGAN/datasets/summer2winter_yosemite.zip)
Introduction
The unsupervised image-to-image translation (UI2I) is to translate an image from one domain
to another, capable of changing the appearance of a given image while keeping its geometry
unchanged. For example, from a horse to a zebra, from a low-resolution image to a high-resolution image, from a photograph to an art painting, and vice versa [1, 2]. UI2I has received a
lot of attention due to its excellent performance in areas such as image style transfer [3–7], colourisation [8], super-resolution [9, 10], dehazing [11], denoising [12], image Synthesis [13],
text-to-image Synthesis [14], image Generation [15, 16], and underwater image restoration
[17].
In recent years, with the emergence of Generative Adversarial Networks (GANs) [18],
many works with GAN has been proposed to solve the UI2I tasks [19–24]. In UI2I tasks without paired training data, the main problem of GANs is that the adversarial loss [18] is un-constrained and many mappings functions exist between the source and target domains, which
PLOS ONE | https://doi.org/10.1371/journal.pone.0280073 January 6, 2023
1 / 22
PLOS ONE
The Horse2Zebra datasets can be obtained on
website (https://people.eecs.berkeley.edu/
�taesung_park/CycleGAN/datasets/horse2zera.
zip) The Photo2Van Gogh datasets can be obtained
on website (https://people.eecs.berkeley.edu/
�taesung_park/CycleGAN/datasets/
vangogh2photo.zip) The Grumpifycat datasets can
be obtained on website (https://people.eecs.
berkeley.edu/�taesung_park/CycleGAN/datasets/
grumpifycat.zip).
Funding: This research was funded by the National
Key Research and Development Program of China
(2020YFC1523301 and 2019YFC1521103),
National Natural Science Foundation of China
(62271393), Key Research and Development
Program of Shaanxi Province (2019ZDLSF07-02,
2019ZDLGY10-01 and 2021GY-171), National
Natural Science Foundation of China (61731015),
Key Research and Development Program of
Qinghai Province (2020-SF-142). The funders had
no role in study design, data collection and
analysis, decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
One-shot image-to-image translation
may lead to unstable training and failure of image translation. To solve this problem, CycleGAN [3], DiscoGAN [25] and DualGAN [26], introduce the cycle-consistency loss [3] to network model and learn the reverse mapping from target to the source domain with the
reconstruction consistency constraint.
The above methods usually require a large number of unpaired images for training. However, the massive unpaired images are difficult to be obtained. Therefore, the Few-Shot and
One-Shot learning has attracted more and more reserchers’ interest [27–29]. In One-Shot
unsupervised learning, the source and target domain each has only one image and these two
images are unpaired. Unfortunately, one-shot and few-shot usually leads to severe overfitting
of the model. Therefore how to solve UI2I task with a small number of training samples faces
great challenges.
The recently proposed SinGAN [30] shows that there are enough information contained in
the patches of one image to train a GAN model, thus a large amount of information can be
used to extracte from a single image. Unfortunately, the SinGAN is limited to generate a specific data distribution, not suitable for UI2I task.
Furthermore, due to lack of constraints, SinGAN is a serial multi-level model structure with
slow training and is limited to generating specific data distributions, resulting in blurred
image translated images translation results. ConsinGAN [31] uses parallelism for the first time
to improve the training speed, but still does not solve the image translation blur problem of
UI2I. TuiGAN [32] takes full advantage of SinGAN’s learning of translated images by multiple
scales and using consistency loss to limit the structural difference between two images, which
can achieve UI2I of two unpaired images.
However, as with SinGAN, it is limited by the serial structure of the model resulting in slow
training, and cannot effectively capture the differences between the source and target domains
at different scales due to the constant changing of the perceptual field to extract the underlyi (...truncated)