Deferred demosaicking: efficient first-person view drone video encoding (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11554-025-01675-1.pdf

Deferred demosaicking: efficient first-person view drone video encoding

Journal of Real-Time Image Processing (2025) 22:101 https://doi.org/10.1007/s11554-025-01675-1 RESEARCH Deferred demosaicking: efficient first‑person view drone video encoding Jakov Benjak1 · Daniel Hofman1 · Hrvoje Mlinarić1 Received: 19 June 2024 / Accepted: 31 March 2025 © The Author(s) 2025 Abstract In the pursuit of effective real-time video transmission for First-Person View (FPV) drone systems, optimizing the encoding process is paramount. Traditional encoding methods, reliant on pre-encoding demosaicking, often fall short in balancing the trade-off between video quality and latency, essential for seamless real-time feedback. This work proposes a novel approach by deferring the demosaicking process to the decoder side, thereby encoding the rearranged Bayer pattern (RGGB) data directly. This deferment significantly reduces the input data size, to the tune of a threefold reduction, thereby achieving a more expeditious encoding process. The tailored encoder and decoder architecture ensures the accurate reconstruction of the full-color image on the decoder side. Through a comprehensive evaluation, leveraging a specialized video quality assessment framework designed for FPV drone footage, our findings illuminate the substantial benefits of our proposed method. Specifically, it achieves faster encoding times and reduced computational overhead, pivotal for low-latency applications. Furthermore, this study opens avenues for integrating advanced encoding techniques into commercial FPV drone systems, potentially enriching user experiences across various applications. Our research not only addresses a critical gap in realtime video transmission but also sets the stage for future exploration into optimizing encoding methodologies for the next generation of FPV drone technologies. Keywords High efficiency video coding · Computational efficiency · Image processing · Data compression · UAV · Demosaicking 1 Introduction The rapid advancement in drone technology has propelled the utilization of First-Person View (FPV) systems in a myriad of applications ranging from recreational drone racing to professional surveillance and inspection tasks. A pivotal aspect that governs the effectiveness and user satisfaction in these applications is the real-time transmission of high-quality video feeds from drones to user’s goggles [1]. However, the traditional video encoding and transmission schemes often struggle to meet the low-latency and * Daniel Hofman Jakov Benjak Hrvoje Mlinarić 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia high-quality requirements intrinsic to real-time FPV drone operations. The core of this challenge lies in the first step of the video encoding process—demosaicking the raw image data captured by the drone’s camera sensor. Most modern cameras use a single light sensor equipped with a color filter array (CFA) to gather color information in images [2]. Demosaicking is the process of interpolating the Bayer pattern image data (Fig. 1) to produce a fullcolor image. Traditionally, this process is performed prior to encoding the video data, which then undergoes compression before being transmitted to the decoder. However, this conventional workflow induces a substantial computational overhead due to the enlargement of data size post-demosaicking, thereby increasing the encoding latency and reconstructed video quality. In light of the aforementioned challenges, this paper introduces a novel encoding paradigm that defers the demosaicking process to the decoder side. By directly encoding the raw Bayer pattern (RGGB) data, the proposed scheme significantly reduces the input data size, effecting a threefold Vol.:(0123456789) 101 Page 2 of 12 Journal of Real-Time Image Processing (2025) 22:101 Fig. 1 Bayer pattern—raw data captured by the camera sensors, and three planes featuring reorganized colors reduction. This size reduction, in turn, expedites the encoding process, diminishing the encoding latency which is essential for real-time video transmission in FPV drone systems. Additionally, our novel encoding paradigm not only addresses the aforementioned challenges but also enhances the efficiency of the encoder side. By shifting the demosaicking process to the decoder side, we significantly reduce the computational burden on the encoder, making it more lightweight. This aspect holds particular significance in systems reliant on battery power, where minimizing energy consumption is of high importance. Given that video encoding typically consumes a substantial amount of battery power, our method effectively defers this energy-intensive task to the decoder side. Consequently, the encoder can operate more efficiently, prolonging battery life. Moreover, since the decoder can typically be connected to a power source via a cable, this approach ensures uninterrupted operation, particularly crucial for real-time video transmission in FPV drone systems. To evaluate the effectiveness of the proposed encoding scheme, this research also introduces a specialized video quality evaluation framework tailored for FPV drone footage. The framework leverages Region Of Interest (ROI) encoding to offer a nuanced assessment of video quality, essential for FPV drone videos where different focus areas in the transmitted video are often subject to ROI-based encoding. An example of an ROI encoded video frame is shown in Fig. 2. Encoding artifacts can be much more clearly seen on the left side of the emphasized part of the image, as it is encoded using higher QP values. Performing a conventional quality assessment fails to provide an accurate objective quality metric, as it does not prioritize image regions within the specified ROI. For FPV pilots, areas outside the ROI hold less significance and should not be evaluated as stringently as the focused region. The tailored encoder and decoder architecture presented in this work not only ensures the efficient encoding of raw data but also the accurate reconstruction of full-color images on the decoder side. Through thorough evaluation, including the use of the specialized video quality evaluation framework, we highlight the benefits of the deferred demosaicking Fig. 2 An example of a ROI encoded video frame, displaying noticeably lower quality on the left side of the highlighted section approach in achieving faster encoding times while maintaining high video quality. The results from this study point to a promising path for improving real-time video transmission capabilities in FPV drone systems, thereby boosting their performance in surveillance, drone racing, and immersive user experiences. 2 Related work The domain of video encoding has seen various approaches aimed at optimizing latency, computational complexity, and video quality. While research specifically focused on FPV drone videos is relatively new, the techniques developed for traditional video encoding can be adapt (...truncated)