Image processing with Optical matrix vector multipliers implemented for encoding and decoding tasks (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41377-025-01904-z.pdf

Image processing with Optical matrix vector multipliers implemented for encoding and decoding tasks

Kim et al. Light: Science & Applications (2025)14:248 https://doi.org/10.1038/s41377-025-01904-z www.nature.com/lsa ARTICLE Open Access Image processing with Optical matrix vector multipliers implemented for encoding and decoding tasks Minjoo Kim1, Yelim Kim1 and Won Il Park 1✉ 1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,; Abstract This study introduces an optical neural network (ONN)-based autoencoder for efﬁcient image processing, utilizing specialized optical matrix-vector multipliers for both encoding and decoding tasks. To address the challenges in efﬁcient decoding, we propose a method that optimizes output processing through scalar multiplications, enhancing performance in generating higher-dimensional outputs. By employing on-system iterative tuning, we mitigate hardware imperfections and noise, progressively improving image reconstruction accuracy to near-digital quality. Furthermore, our approach supports noise reduction and optical image generation, enabling models such as denoising autoencoders, variational autoencoders, and generative adversarial networks. Our results demonstrate that ONN-based systems have the potential to surpass the energy efﬁciency of traditional electronic systems, enabling realtime, low-power image processing in applications such as medical imaging, autonomous vehicles, and edge computing. Introduction Optical image processing is fundamental to applications in autonomous navigation, security, defense, and medical diagnostics1–4. It begins with capturing images using cameras and converting these analog visuals into digital data, followed by image recognition, reconstruction, interpretation, and decision-making. However, digital images are typically large and contain extraneous information, such as background patterns and noise, necessitating preprocessing to reduce dimensionality while retaining essential information. Image encoders, including autoencoders—a popular deep neural network (DNN) architecture—transform high-dimensional input into a compact latent space by compressing the data through a series of layers5–9. Autoencoders also employ decoder networks to reverse this process, reconstructing the compressed data back to its original form10–12. Decoders are trained to closely reproduce the original input, which, combined with encoder Correspondence: Won Il Park () 1 Division of Materials Science and Engineering, Hanyang University, Seoul, Republic of Korea These authors contributed equally: Minjoo Kim, Yelim Kim networks, makes autoencoders valuable for tasks like data compression, noise reduction, and feature extraction. In generative models, such as variational autoencoders (VAEs) and generative adversarial networks (GANs), decoders generate realistic and contextually appropriate images, crucial for image enhancement, data augmentation, and creative applications in art and ﬁlm13–15. Despite signiﬁcant advancements in computational efﬁciency, driven by both software and hardware improvements16,17, the electronic processing of DNNs using accelerators faces challenges such as high energy consumption and limitations in scaling parallelism and speed efﬁciency. Optical neural networks (ONNs) offer an alternative with effective operations such as matrix-vector multiplication (MVM) and nonlinear activation of optical signals8,18–23. Fully optical multilayer, nonlinear preprocessor for 2D image processing has been implemented using components like spatial light modulators (SLMs), liquid crystal displays (LCDs), diffraction optical elements (DOEs), nonlinear optical materials, and saturable ampliﬁer (e.g., image intensiﬁer), enabling ONNs to efﬁciently compress and process large images with lower latency5,24–27. © The Author(s) 2025 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Kim et al. Light: Science & Applications (2025)14:248 Current ONN systems have shown superior performance in encoding operation, using optical analog computing to compress image data before digital conversion19,28–30. However, decoding requires a similar computational resource, highlighting the need for advancements in both encoding and decoding. Given the ultimate ONN goal is to implement multi-layer neural networks primarily through optical methods–or at least in combination with analog approaches5,6–developing effective optical decoders is crucial to fully utilize the potentials of autoencoders in handling complex realworld visual data. Although research has been conducted on 2D image encoding and decoding using diffractive deep neural networks (D2NNs)31,32, these systems inherently require coherent light as input. This presents a fundamental limitation because actual incoherent image inputs must undergo an optoelectronic conversion process to generate coherent input light. This additional step introduces complexity and potential latency, undermining the beneﬁts of optical processing. Direct encoding of natural images5,33 and subsequent decoding with high accuracy and low latency are essential for applications like autonomous navigation, medical diagnostics, and defense, where fast and reliable image processing directly impacts safety and performance. In this work, we present an ONN-based autoencoder that leverages optical matrix-vector multipliers for both encoding and decoding tasks. While previous MVM strategies efﬁciently handle scenarios where input dimensions exceed output dimensions5,31,33–36, they exhibit limitations when dealing with larger output sizes. To overcome this, we propose a method for optimizing output processing via scalar multiplications, enhancing the efﬁciency of higherdimensional output generation. We introduced on-system iterative tuning to mitigate hardware imperfections and noise, which gradually improved image reconstruction accuracy, nearing the quality of digital processors. Additionally, we extended our approach to noise reduction and optical image generation, enabling functions such as denoising autoencoder (DAE), VAE, and GAN. Our analysis indicates that, with further optimizations, ONN-based autoencoders have the potential to signiﬁcantly exceed the energy efﬁciency of electronic systems. These advancemen (...truncated)