H.264/SVC Mode Decision Based on Mode Correlation and Desired Mode List
International Journal of Automation and Computing
11(5), October 2014, 510-516
DOI: 10.1007/s11633-014-0830-5
H.264/SVC Mode Decision Based on Mode Correlation
and Desired Mode List
L. Balaji1
1
K. K. Thyagharajan2
Faculty of Information and Communication, Anna University, Chennai 600025, India
1
RMD Engineering College, Chennai 601206, India
Abstract: Design of video encoders involves implementation of fast mode decision (FMD) algorithm to reduce computation complexity while maintaining the performance of the coding. Although H.264/scalable video coding (SVC) achieves high scalability and
coding efficiency, it also has high complexity in implementing its exhaustive computation. In this paper, a novel algorithm is proposed
to reduce the redundant candidate modes by making use of the correlation among layers. A desired mode list is created based on
the probability to be the best mode for each block in base layer and a candidate mode selection in the enhancement layer by the
correlations of modes among reference frame and current frame. Our algorithm is implemented in joint scalable video model (JSVM)
9.19.15 reference software and the performance is evaluated based on the average encoding time, peak signal to noise ration (PSNR)
and bit rate. The experimental results show 41.89% improvement in encoding time with minimal loss of 0.02 dB in PSNR and 0.05%
increase in bit rate.
Keywords:
1
H.264, scalable video coding, mode decision, mode correlation, rate distortion cost.
Introduction
Applications of multimedia through digital broadcasting
over various kinds of devices (like mobile, laptop, personal
data assistants (PDAs), high definition television (HDT),
standard definition television (SDTV), etc.) are increasingly important. And they need a better scalability in video
coding due to the variable nature of bandwidth. A scalable
extension of H.264/advanced video coding (AVC) is standardized to provide best suitable video coding in 2007 as
H.264 scalable video coding[1] . A reference software is developed by motion picture experts group (MPEG) and video
coding experts group (VCEG) jointly called as joint video
team (JVT) for scalable video coding[2, 3] .
The inherent nature of spatial, temporal and signal
to noise ratio (SNR) or quality scalability with respect
to H.264/AVC makes H.264/scalable video coding (SVC)
standardized[3] , and its performance in achieving high efficiency in coding is evaluated[4] . In spatial scalability, the
picture with lowest spatial resolution is considered as base
layer and is encoded as H.264/AVC compatible bit stream,
whereas the picture with high resolution which is an unsampled residue between the original and reconstructed signal
of base layer is considered enhancement layer. In temporal scalability, a hierarchical B picture approach is used
for a particular spatial layer with zero structural delay.
H.264/SVC constitute I, P and B pictures in which I/P picture will be the key picture and is encoded with normal intervals by only previous key picture as reference. The B picture encodes the pictures between the two key pictures. The
size of group of pictures (GOP) size determines the number
of temporal layers in a spatial layer, where a GOP is nothing but a key picture followed by all the temporally located
Regular paper
Special Issue on Massive Visual Computing
Manuscript received January 13, 2014; accepted June 20, 2014
Special Issue on Massive Visual Computing
pictures till the next key picture. The relation between the
spatial and temporal scalability employs SNR or quality
scalability which is based on different spatio-temporal reconstruction quality levels namely coarse grain scalability
(CGS) and medium grain scalability (MGS). CGS is nothing but a single temporal layer per spatial layer and MGS
is multiple temporal layers per spatial layer.
Although H.264/SVC with a unique bit stream adaptation to various bit rates, transmission channel bandwidth
and display capabilities, achieves high scalability and high
efficiency in coding, the computation complexity of the encoder is very high because of its inherent nature. Due to
the hierarchical B picture approach in the temporal layer, it
needs all the modes to be searched to be the best candidate
mode prediction by full search algorithm implemented in
joint scalable video model (JSVM). This is more time consuming and complex for the encoder. Focusing this issue,
many research works were proposed to reduce the complexity in terms of fast mode decision (FMD) algorithm by reducing the redundant candidate mode in H.264/SVC. These
works predict the redundant modes using rate distortion
cost (RDC) function and the correlation among the hierarchical B picture structure. The computation complexity was efficiently decreased by these works with degraded
video quality. But they were not suitable for sequences with
large motions.
Nowadays, too many hand held devices with typical
structural implementations, have increasing requirement for
video quality as an important issue. It is enhancement layer
where the quality has to be increased. But to conserve
power for hand held devices is also an important issue to
be considered particularly for real time video applications.
Overall, the video quality and reduction in computation
complexity need to be more important while implementing
any algorithm.
In this paper, we focus on reduction of candidate mode
L. Balaji and K. K. Thyagharajan / H.264/SVC Mode Decision Based on Mode Correlation and Desired Mode List
by using probability model and mode correlation. The
probability model creates a list of modes to be the best
in base layer and mode correlation decides the best mode
in enhancement layer. The rest of this paper is organized as
follows. In Section 2, background and related works based
on fast mode decision algorithms implemented in SVC, rate
distortion cost procedure, probabilistic model and mode
correlations were discussed. In Section 3, the proposed algorithm for complexity reduction is discussed. And the experimental results with comparative analysis are discussed
in Section 4. Section 5 concludes this paper.
2
Background and related work
Three new modes such as motion vectors, residuals, and
intra information were introduced in the inter-layer prediction from the base layer to select the best coding mode
in the enhancement layer. Based on these inter-layer prediction modes, better improvement in coding efficiency is
achieved along with scalability. But these inter-layer modes
have to do rate distortion optimization (RDO) many times
which involves very high computational complexity. Particularly, residual prediction mode must be performed twice of
the RDO process which increases twice the computational
complexity of the normal RDO process of H.264/AVC. This
complexity implementation is reduced by an efficient architecture proposed in [5] by changing the processing order,
here the prediction mode of reference macro block (MB) is
used to predict the ca (...truncated)