Trellis-based optimization of layer extraction for rate adaptation in real-time scalable stereo video coding
c TÜBİTAK
Turk J Elec Eng & Comp Sci, Vol.20, No.4, 2012,
doi:10.3906/elk-1010-880
Trellis-based optimization of layer extraction for rate
adaptation in real-time scalable stereo video coding
Nükhet ÖZBEK
Department of Computer Engineering, Yaşar University, Bornova 35100, İzmir-TURKEY
e-mail:
Received: 23.10.2010
Abstract
The concept of quality layers (QLs) has been adopted in the scalable video coding standard to enable
optimal rate adaptation of precoded video in the rate-distortion sense. QLs were previously extended to stereo
and multiple-view scalable video for efficient transport of 3DTV over the Internet. However, it is not possible
to use the QL method in applications that require real-time encoding since the priority determination process
assumes the availability of the whole video sequence. In this work, a trellis-based online rate adaptation is
proposed for real-time scalable stereo video coding, with a delay of 1 group of pictures (GoP). The delay can
be controlled by selection of the GoP size according to the application, such as 16 frames for live broadcast
or 8 or 4 frames for videoconferencing. In addition, the joint optimization of layer extraction for scalable
multiview coded stereo video is also proposed. It is assumed that the encoder/extractor is aware of the available
dynamic network bandwidth in order to perform rate-distortion optimized medium-grain fidelity scalability
layer selection for each GoP. Experimental results show that the performance of the proposed online method
is very close to that of QLs that would require the whole video sequence.
Key Words: 3DTV, dynamic rate allocation, quality layers, scalable video coding, scalable multiview video
coding, trellis-based optimization
1.
Introduction
Recently, scalable video coding (SVC) and stereo and multiview video have gained wide interest. The new
SVC and multiview video coding (MVC) standards, which are extensions of H.264/AVC [1], MPEG-4 part
10 (ISO/IEC 14496-10:2005/AMD3), were developed by the Joint Video Team to respond to market needs for
Internet video and 3D video, respectively. The SVC amendment was finalized in 2007 and MVC was added to the
standard in 2009 [2,3]. The joint scalable video model (JSVM) [4] and the joint multiview video model (JMVM)
[5] were developed as reference codecs to provide software implementation and to demonstrate nonnormative
encoding tools. Although the JMVM was implemented based on the JSVM, in order to take advantage of some
of the interfaces and transport mechanisms introduced for SVC, the JMVM currently does not support scalable
coding.
557
Turk J Elec Eng & Comp Sci, Vol.20, No.4, 2012
For adaptive streaming applications, packet-based fidelity scalability and optimized rate adaptation are
highly desirable. A low-complexity but high-performance method for packet-based fidelity scalability, also
referred to as medium-grain fidelity scalability (MGS), was adopted as a normative element of SVC [2]. MGS
operates in the transform domain and allows fragmentation of a given fidelity enhancement, which means
frequency-selective grouping of the transform coefficients [6]. This splitting of coefficients among fragments
enables graceful degradation if fragments are dropped during adaptation.
The quality layers (QLs) concept, which was designed for transmitting a priority value for each network
abstraction layer unit (NALU), was adopted in the SVC standard in order to enable an optimal adaptation in
a rate-distortion (R-D) sense [7]. However, the method of deriving suitable priority id values is not part of the
standard. The example method in [7] also presents a way to extract NALUs according to priority ids for a given
target bitrate. The QL method is used as a ground truth and the goodness criterion is how close the proposed
method is to the QL method in terms of R-D performance.
Latest advances in 3DTV technology have led to new approaches for efficient coding and transport of
multiview video (MVV). There are several approaches for the encoding of MVV, which provide a trade-off
between random access, ease of rate adaptation, and compression efficiency, allowing simulcast coding, scalable
simulcast coding, MVC, and scalable multiview video coding (SMVC). MVC based on hierarchical B-pictures
in temporal and interview dimensions has proven to have the best performance in exhaustive experiments
conducted in the context of MPEG standardization. The effectiveness of this approach was demonstrated by
an experimental analysis of temporal versus interview prediction in terms of a Lagrange cost function in [8].
In order to combine the advantages of scalable coding and MVC, SMVC was recently introduced, which
is an extension of SVC [2] for MVC and presented coding results that were superior to simulcast SVC of stereo
and multiple views with effectiveness in view and/or layer switching [9,10]. SMVC uses hierarchical B-pictures
in both temporal and interview prediction. QLs were also extended to stereo and multiview scalable video
for adaptive optimized 3DTV streaming over the Internet [11,12]. However, it is not possible to use QLs in
applications that require real-time encoding since the priority determination process assumes availability of the
whole stereo video sequence and the R-D data are computed for each NALU in which computing the distortion
is the most time consuming.
Trellis-based approaches are widely used in bit allocation by optimal quantization [13-15], video summarization [16], and optimal fidelity scaling for spatial layers [17]. In this paper, a trellis-based online rate
adaptation is proposed for real-time scalable coding of stereo videos. The algorithm assumes that the encoder/extractor is aware of the available network bandwidth and thus R-D optimized (RDO) MGS layer selection can be performed dynamically for each group of pictures (GoP). The delay can be controlled by selection
of a suitable GoP size according to the application, such as 16 frames for live broadcast or 8 or 4 frames for
videoconferencing. The paper is organized as follows: Section 2 reviews scalable stereoscopic video coding,
which was developed earlier, along with the QL concept and previous work on stereo extension of QLs. Section
3 introduces the proposed algorithm and discusses implementation and complexity issues. Section 4 provides
experimental results with monocular and stereoscopic test sequences. Conclusions are drawn in Section 5.
558
ÖZBEK: Trellis-based optimization of layer extraction for...,
2.
Background
2.1.
Scalable stereo video coding
The SMVC design [9] exploits the temporal scalability feature of the JSVM reference software [4] by sequential
interleaving of the first (right) and second (left) views in each GoP. The prediction structure, which is given in
detail in [9], supports adaptive temporal or disparity-compensated prediction such that every frame in the left
view uses past and future frames from its own view and the collocated frame from the ri (...truncated)