Deferred demosaicking: efficient first-person view drone video encoding
Journal of Real-Time Image Processing
(2025) 22:101
https://doi.org/10.1007/s11554-025-01675-1
RESEARCH
Deferred demosaicking: efficient first‑person view drone video
encoding
Jakov Benjak1
· Daniel Hofman1
· Hrvoje Mlinarić1
Received: 19 June 2024 / Accepted: 31 March 2025
© The Author(s) 2025
Abstract
In the pursuit of effective real-time video transmission for First-Person View (FPV) drone systems, optimizing the encoding
process is paramount. Traditional encoding methods, reliant on pre-encoding demosaicking, often fall short in balancing the
trade-off between video quality and latency, essential for seamless real-time feedback. This work proposes a novel approach
by deferring the demosaicking process to the decoder side, thereby encoding the rearranged Bayer pattern (RGGB) data
directly. This deferment significantly reduces the input data size, to the tune of a threefold reduction, thereby achieving a
more expeditious encoding process. The tailored encoder and decoder architecture ensures the accurate reconstruction of
the full-color image on the decoder side. Through a comprehensive evaluation, leveraging a specialized video quality assessment framework designed for FPV drone footage, our findings illuminate the substantial benefits of our proposed method.
Specifically, it achieves faster encoding times and reduced computational overhead, pivotal for low-latency applications.
Furthermore, this study opens avenues for integrating advanced encoding techniques into commercial FPV drone systems,
potentially enriching user experiences across various applications. Our research not only addresses a critical gap in realtime video transmission but also sets the stage for future exploration into optimizing encoding methodologies for the next
generation of FPV drone technologies.
Keywords High efficiency video coding · Computational efficiency · Image processing · Data compression · UAV ·
Demosaicking
1 Introduction
The rapid advancement in drone technology has propelled
the utilization of First-Person View (FPV) systems in a
myriad of applications ranging from recreational drone
racing to professional surveillance and inspection tasks. A
pivotal aspect that governs the effectiveness and user satisfaction in these applications is the real-time transmission
of high-quality video feeds from drones to user’s goggles
[1]. However, the traditional video encoding and transmission schemes often struggle to meet the low-latency and
* Daniel Hofman
Jakov Benjak
Hrvoje Mlinarić
1
Faculty of Electrical Engineering and Computing, University
of Zagreb, Unska 3, 10000 Zagreb, Croatia
high-quality requirements intrinsic to real-time FPV drone
operations. The core of this challenge lies in the first step of
the video encoding process—demosaicking the raw image
data captured by the drone’s camera sensor.
Most modern cameras use a single light sensor equipped
with a color filter array (CFA) to gather color information
in images [2]. Demosaicking is the process of interpolating the Bayer pattern image data (Fig. 1) to produce a fullcolor image. Traditionally, this process is performed prior
to encoding the video data, which then undergoes compression before being transmitted to the decoder. However, this
conventional workflow induces a substantial computational
overhead due to the enlargement of data size post-demosaicking, thereby increasing the encoding latency and reconstructed video quality.
In light of the aforementioned challenges, this paper
introduces a novel encoding paradigm that defers the demosaicking process to the decoder side. By directly encoding
the raw Bayer pattern (RGGB) data, the proposed scheme
significantly reduces the input data size, effecting a threefold
Vol.:(0123456789)
101
Page 2 of 12
Journal of Real-Time Image Processing
(2025) 22:101
Fig. 1 Bayer pattern—raw data captured by the camera sensors, and
three planes featuring reorganized colors
reduction. This size reduction, in turn, expedites the encoding process, diminishing the encoding latency which is
essential for real-time video transmission in FPV drone systems. Additionally, our novel encoding paradigm not only
addresses the aforementioned challenges but also enhances
the efficiency of the encoder side. By shifting the demosaicking process to the decoder side, we significantly reduce the
computational burden on the encoder, making it more lightweight. This aspect holds particular significance in systems
reliant on battery power, where minimizing energy consumption is of high importance. Given that video encoding
typically consumes a substantial amount of battery power,
our method effectively defers this energy-intensive task to
the decoder side. Consequently, the encoder can operate
more efficiently, prolonging battery life. Moreover, since
the decoder can typically be connected to a power source
via a cable, this approach ensures uninterrupted operation,
particularly crucial for real-time video transmission in FPV
drone systems.
To evaluate the effectiveness of the proposed encoding
scheme, this research also introduces a specialized video
quality evaluation framework tailored for FPV drone footage. The framework leverages Region Of Interest (ROI)
encoding to offer a nuanced assessment of video quality,
essential for FPV drone videos where different focus areas in
the transmitted video are often subject to ROI-based encoding. An example of an ROI encoded video frame is shown
in Fig. 2. Encoding artifacts can be much more clearly seen
on the left side of the emphasized part of the image, as it
is encoded using higher QP values. Performing a conventional quality assessment fails to provide an accurate objective quality metric, as it does not prioritize image regions
within the specified ROI. For FPV pilots, areas outside the
ROI hold less significance and should not be evaluated as
stringently as the focused region.
The tailored encoder and decoder architecture presented
in this work not only ensures the efficient encoding of raw
data but also the accurate reconstruction of full-color images
on the decoder side. Through thorough evaluation, including
the use of the specialized video quality evaluation framework, we highlight the benefits of the deferred demosaicking
Fig. 2 An example of a ROI encoded video frame, displaying noticeably lower quality on the left side of the highlighted section
approach in achieving faster encoding times while maintaining high video quality. The results from this study point to a
promising path for improving real-time video transmission
capabilities in FPV drone systems, thereby boosting their
performance in surveillance, drone racing, and immersive
user experiences.
2 Related work
The domain of video encoding has seen various approaches
aimed at optimizing latency, computational complexity, and
video quality. While research specifically focused on FPV
drone videos is relatively new, the techniques developed for
traditional video encoding can be adapt (...truncated)