Performance analysis of optimized versatile video coding software decoders on embedded platforms
Journal of Real-Time Image Processing
(2023) 20:120
https://doi.org/10.1007/s11554-023-01376-7
RESEARCH
Performance analysis of optimized versatile video coding software
decoders on embedded platforms
Anup Saha1
· Wassim Hamidouche2,3
· Miguel Chavarrías1
· Fernando Pescador1
· Ibrahim Farhat2
Received: 9 May 2023 / Accepted: 4 October 2023
© The Author(s) 2023
Abstract
In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created
the need for a new video coding standard. Therefore, in July 2020, the versatile video coding (VVC) standard was released,
providing up to 50% bit-rate savings for the same video quality compared to its predecessor high-efficiency video coding
(HEVC). However, these bit-rate savings come at the cost of high computational complexity, particularly for live applications and on resource-constrained embedded devices. This paper evaluates two optimized VVC software decoders, named
OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. These decoders exploit optimization
techniques such as data-level parallelism using single instruction multiple data (SIMD) instructions and functional-level
parallelism using frame, tile, and slice-based parallelisms. Furthermore, a comparison of decoding runtime, energy, and
memory consumption between the two decoders is presented while targeting two different resource-constraint embedded
devices. The results showed that both decoders achieve real-time decoding of full high-definition (FHD) resolution on the
first platform using 8 cores and high-definition (HD) real-time decoding for the second platform using only 4 cores with
comparable results in terms of the average energy consumed: around 26 J and 15 J for the 8 cores and 4 cores platforms,
respectively. Furthermore, OpenVVC showed better results regarding memory usage with a lower average maximum memory
consumed during runtime than VVdeC.
1 Introduction
This work was supported by both the Energy Efficient Enhanced
Media Streaming (3EMS) project funded by the Brittany Region
and TALENT project (PID2020-116417RB-C41), funded by the
Spanish Ministerio de Ciencia y Innovación.
* Miguel Chavarrías
Anup Saha
Wassim Hamidouche
;
Fernando Pescador
Ibrahim Farhat
1
CITSEM at Universidad Politécnica de Madrid, Madrid,
Spain
2
Univ. Rennes, INSA Rennes, CNRS, IETR—UMR,
6164 Rennes, France
3
Technology Innovation Institute (TII), P.O.Box: 9639,
Masdar City Abu Dhabi, UAE
A new era of information and communication technologies
is emerging, where video communication plays an essential
role in internet traffic. In particular, the significant increase
in video traffic, supported by emerging video formats and
applications, has led to the development of a new video coding standard named versatile video coding (VVC)/H.266.
The latter was standardized in July 2020 by the Joint Video
Experts Team (JVET) of the ITU-T Video Coding Experts
Group (VCEG) and the Motion Picture Experts Group
(MPEG) of ISO/IEC JTC 1/SC 29 [1]. VVC enables bit-rate
savings of up to 50% [15] with respect to the previous standard High Efficiency Video Coding (HEVC)/H.265 [2] for the
same video quality. However, this achievement comes at the
cost of 8× and 2× more complexity compared to HEVC for
the reference encoder and decoder, respectively [3]. In this
scenario, the main challenge is to develop real-time VVC
codecs, either a hardware or software solution for video
encoding or decoding, that consider resource-constrained
consumer devices frequently used in consumer electronics
based on embedded platforms.
13
Vol.:(0123456789)
120
Page 2 of 13
Each coding standard is released with a reference software implementation available to the scientific community.
These solutions incorporate all the standard features but
offer minimal speed performance. For example, in the case
of VVC, the reference software is VVC test model (VTM)
[4]. Taking this as a starting point, research groups and companies develop their own real-time software and hardware
solutions. These solutions mainly exploit the intrinsic parallelism of the algorithms, both at the data and functional
levels, to enhance their performance in terms of speed and
energy consumption. In the first case, some data operations
included in the source code are optimized using instructions
of type single instruction multiple data (SIMD) [5]. Here,
vectorized operations perform mathematical operations with
more than one operator using a single processor instruction.
The other potential optimization route is to take advantage
of the intrinsic parallelism of the independent processing of
pictures [6] or smaller parts of the picture, such as slices [7]
or tiles [8]. In the latter case, it is necessary that the coding is done by activating these normative tools that break
dependencies between adjacent regions.
In this work, two open-source VVC decoders are evaluated and compared against each other. These solutions,
named OpenVVC [9, 10] and Versatile Video deCoder
(VVdeC) [11] decoders, are optimized using data and functional-level parallelism techniques. This paper evaluates
their performance in decoding runtime, power consumption, and memory usage targeting two different embedded
platforms. The results showed that both decoders achieved
15 to 34 frame per second (fps) for ultra-high-definition
(UHD) sequences with quantization parameter (QP) 27 and
37 and achieved real-time decoding of full high-definition
(FHD) and high definition (HD) sequences over the first target platform using 8 cores. Furthermore, 16 to 28 fps have
been obtained for FHD sequences with QPs 27 and 37, and
real-time decoding has been achieved for all HD sequences
by OpenVVC and VVdeC when targeting the second embedded platform with 4 cores. Regarding energy consumption
and maximum memory usage, the experimental results
showed that VVdeC requires 2× more memory compared
to OpenVVC on both target platforms. On the other hand,
OpenVVC consumed the same energy as VVdeC on the
embedded system-on-chip (ESoC)1 platform with 8 cores
and around 1.25× VVdeC’s energy consumption when targeting the ESoC2 embedded platform with 4 cores.
The remainder of the paper is structured as follows. First,
Sect. 2 briefly introduces the VVC standard. Then, Sect. 3
describes the optimizations included in VVC decoders using
specific parallelization techniques along with the state-ofthe-art of VVC decoders, followed by a brief description of
open-source VVC decoders in Sect. 4. Next, Sect. 5 details
the proposed optimization techniques in the OpenVVC
decoder. The results obtained and the comparison between
13
Journal of Real-Time Image Processing
(2023) 20:120
the performance of OpenVVC and VVdeC are provided in
Sect. 6. Finally, Sect. 7 concludes the paper.
2 Introduction to VVC
This section briefly describes the VVC decoder and related
codec optimizations in scientific literature. Like its predecessors, (...truncated)