CRPGAN: Learning image-to-image translation of two unpaired images by cross-attention mechanism and parallelization strategy (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0280073&type=printable

CRPGAN: Learning image-to-image translation of two unpaired images by cross-attention mechanism and parallelization strategy

PLOS ONE RESEARCH ARTICLE CRPGAN: Learning image-to-image translation of two unpaired images by cross-attention mechanism and parallelization strategy Long Feng ID, Guohua Geng ID*, Qihang Li, Yi Jiang, Zhan Li, Kang Li School of Information Science and Technology, Northwest University, Xi’an, China * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Feng L, Geng G, Li Q, Jiang Y, Li Z, Li K (2023) CRPGAN: Learning image-to-image translation of two unpaired images by crossattention mechanism and parallelization strategy. PLoS ONE 18(1): e0280073. https://doi.org/ 10.1371/journal.pone.0280073 Editor: Xiangjie Kong, Zhejiang University of Technology, CHINA Received: September 27, 2022 Abstract Unsupervised image-to-image translation (UI2I) tasks aim to find a mapping between the source and the target domains from unpaired training data. Previous methods can not effectively capture the differences between the source and the target domain on different scales and often leads to poor quality of the generated images, noise, distortion, and other conditions that do not match human vision perception, and has high time complexity. To address this problem, we propose a multi-scale training structure and a progressive growth generator method to solve UI2I task. Our method refines the generated images from global structures to local details by adding new convolution blocks continuously and shares the network parameters in different scales and also in the same scale of network. Finally, we propose a new Cross-CBAM mechanism (CRCBAM), which uses a multi-layer spatial attention and channel attention cross structure to generate more refined style images. Experiments on our collected Opera Face, and other open datasets Summer$Winter, Horse$Zebra, Photo$Van Gogh, show that the proposed algorithm is superior to other state-of-art algorithms. Accepted: December 20, 2022 Published: January 6, 2023 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0280073 Copyright: © 2023 Feng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The Summer2Winter datasets can be obtained on website (https:// people.eecs.berkeley.edu/�taesung_park/ CycleGAN/datasets/summer2winter_yosemite.zip) Introduction The unsupervised image-to-image translation (UI2I) is to translate an image from one domain to another, capable of changing the appearance of a given image while keeping its geometry unchanged. For example, from a horse to a zebra, from a low-resolution image to a high-resolution image, from a photograph to an art painting, and vice versa [1, 2]. UI2I has received a lot of attention due to its excellent performance in areas such as image style transfer [3–7], colourisation [8], super-resolution [9, 10], dehazing [11], denoising [12], image Synthesis [13], text-to-image Synthesis [14], image Generation [15, 16], and underwater image restoration [17]. In recent years, with the emergence of Generative Adversarial Networks (GANs) [18], many works with GAN has been proposed to solve the UI2I tasks [19–24]. In UI2I tasks without paired training data, the main problem of GANs is that the adversarial loss [18] is un-constrained and many mappings functions exist between the source and target domains, which PLOS ONE | https://doi.org/10.1371/journal.pone.0280073 January 6, 2023 1 / 22 PLOS ONE The Horse2Zebra datasets can be obtained on website (https://people.eecs.berkeley.edu/ �taesung_park/CycleGAN/datasets/horse2zera. zip) The Photo2Van Gogh datasets can be obtained on website (https://people.eecs.berkeley.edu/ �taesung_park/CycleGAN/datasets/ vangogh2photo.zip) The Grumpifycat datasets can be obtained on website (https://people.eecs. berkeley.edu/�taesung_park/CycleGAN/datasets/ grumpifycat.zip). Funding: This research was funded by the National Key Research and Development Program of China (2020YFC1523301 and 2019YFC1521103), National Natural Science Foundation of China (62271393), Key Research and Development Program of Shaanxi Province (2019ZDLSF07-02, 2019ZDLGY10-01 and 2021GY-171), National Natural Science Foundation of China (61731015), Key Research and Development Program of Qinghai Province (2020-SF-142). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. One-shot image-to-image translation may lead to unstable training and failure of image translation. To solve this problem, CycleGAN [3], DiscoGAN [25] and DualGAN [26], introduce the cycle-consistency loss [3] to network model and learn the reverse mapping from target to the source domain with the reconstruction consistency constraint. The above methods usually require a large number of unpaired images for training. However, the massive unpaired images are difficult to be obtained. Therefore, the Few-Shot and One-Shot learning has attracted more and more reserchers’ interest [27–29]. In One-Shot unsupervised learning, the source and target domain each has only one image and these two images are unpaired. Unfortunately, one-shot and few-shot usually leads to severe overfitting of the model. Therefore how to solve UI2I task with a small number of training samples faces great challenges. The recently proposed SinGAN [30] shows that there are enough information contained in the patches of one image to train a GAN model, thus a large amount of information can be used to extracte from a single image. Unfortunately, the SinGAN is limited to generate a specific data distribution, not suitable for UI2I task. Furthermore, due to lack of constraints, SinGAN is a serial multi-level model structure with slow training and is limited to generating specific data distributions, resulting in blurred image translated images translation results. ConsinGAN [31] uses parallelism for the first time to improve the training speed, but still does not solve the image translation blur problem of UI2I. TuiGAN [32] takes full advantage of SinGAN’s learning of translated images by multiple scales and using consistency loss to limit the structural difference between two images, which can achieve UI2I of two unpaired images. However, as with SinGAN, it is limited by the serial structure of the model resulting in slow training, and cannot effectively capture the differences between the source and target domains at different scales due to the constant changing of the perceptual field to extract the underlyi (...truncated)