Trellis-based optimization of layer extraction for rate adaptation in real-time scalable stereo video coding (pdf)

Article PDF cannot be displayed. You can download it here:

http://dergipark.org.tr/download/article-file/125647

Trellis-based optimization of layer extraction for rate adaptation in real-time scalable stereo video coding

c TÜBİTAK Turk J Elec Eng & Comp Sci, Vol.20, No.4, 2012, doi:10.3906/elk-1010-880 Trellis-based optimization of layer extraction for rate adaptation in real-time scalable stereo video coding Nükhet ÖZBEK Department of Computer Engineering, Yaşar University, Bornova 35100, İzmir-TURKEY e-mail: Received: 23.10.2010 Abstract The concept of quality layers (QLs) has been adopted in the scalable video coding standard to enable optimal rate adaptation of precoded video in the rate-distortion sense. QLs were previously extended to stereo and multiple-view scalable video for eﬃcient transport of 3DTV over the Internet. However, it is not possible to use the QL method in applications that require real-time encoding since the priority determination process assumes the availability of the whole video sequence. In this work, a trellis-based online rate adaptation is proposed for real-time scalable stereo video coding, with a delay of 1 group of pictures (GoP). The delay can be controlled by selection of the GoP size according to the application, such as 16 frames for live broadcast or 8 or 4 frames for videoconferencing. In addition, the joint optimization of layer extraction for scalable multiview coded stereo video is also proposed. It is assumed that the encoder/extractor is aware of the available dynamic network bandwidth in order to perform rate-distortion optimized medium-grain ﬁdelity scalability layer selection for each GoP. Experimental results show that the performance of the proposed online method is very close to that of QLs that would require the whole video sequence. Key Words: 3DTV, dynamic rate allocation, quality layers, scalable video coding, scalable multiview video coding, trellis-based optimization 1. Introduction Recently, scalable video coding (SVC) and stereo and multiview video have gained wide interest. The new SVC and multiview video coding (MVC) standards, which are extensions of H.264/AVC [1], MPEG-4 part 10 (ISO/IEC 14496-10:2005/AMD3), were developed by the Joint Video Team to respond to market needs for Internet video and 3D video, respectively. The SVC amendment was ﬁnalized in 2007 and MVC was added to the standard in 2009 [2,3]. The joint scalable video model (JSVM) [4] and the joint multiview video model (JMVM) [5] were developed as reference codecs to provide software implementation and to demonstrate nonnormative encoding tools. Although the JMVM was implemented based on the JSVM, in order to take advantage of some of the interfaces and transport mechanisms introduced for SVC, the JMVM currently does not support scalable coding. 557 Turk J Elec Eng & Comp Sci, Vol.20, No.4, 2012 For adaptive streaming applications, packet-based ﬁdelity scalability and optimized rate adaptation are highly desirable. A low-complexity but high-performance method for packet-based ﬁdelity scalability, also referred to as medium-grain ﬁdelity scalability (MGS), was adopted as a normative element of SVC [2]. MGS operates in the transform domain and allows fragmentation of a given ﬁdelity enhancement, which means frequency-selective grouping of the transform coeﬃcients [6]. This splitting of coeﬃcients among fragments enables graceful degradation if fragments are dropped during adaptation. The quality layers (QLs) concept, which was designed for transmitting a priority value for each network abstraction layer unit (NALU), was adopted in the SVC standard in order to enable an optimal adaptation in a rate-distortion (R-D) sense [7]. However, the method of deriving suitable priority id values is not part of the standard. The example method in [7] also presents a way to extract NALUs according to priority ids for a given target bitrate. The QL method is used as a ground truth and the goodness criterion is how close the proposed method is to the QL method in terms of R-D performance. Latest advances in 3DTV technology have led to new approaches for eﬃcient coding and transport of multiview video (MVV). There are several approaches for the encoding of MVV, which provide a trade-oﬀ between random access, ease of rate adaptation, and compression eﬃciency, allowing simulcast coding, scalable simulcast coding, MVC, and scalable multiview video coding (SMVC). MVC based on hierarchical B-pictures in temporal and interview dimensions has proven to have the best performance in exhaustive experiments conducted in the context of MPEG standardization. The eﬀectiveness of this approach was demonstrated by an experimental analysis of temporal versus interview prediction in terms of a Lagrange cost function in [8]. In order to combine the advantages of scalable coding and MVC, SMVC was recently introduced, which is an extension of SVC [2] for MVC and presented coding results that were superior to simulcast SVC of stereo and multiple views with eﬀectiveness in view and/or layer switching [9,10]. SMVC uses hierarchical B-pictures in both temporal and interview prediction. QLs were also extended to stereo and multiview scalable video for adaptive optimized 3DTV streaming over the Internet [11,12]. However, it is not possible to use QLs in applications that require real-time encoding since the priority determination process assumes availability of the whole stereo video sequence and the R-D data are computed for each NALU in which computing the distortion is the most time consuming. Trellis-based approaches are widely used in bit allocation by optimal quantization [13-15], video summarization [16], and optimal ﬁdelity scaling for spatial layers [17]. In this paper, a trellis-based online rate adaptation is proposed for real-time scalable coding of stereo videos. The algorithm assumes that the encoder/extractor is aware of the available network bandwidth and thus R-D optimized (RDO) MGS layer selection can be performed dynamically for each group of pictures (GoP). The delay can be controlled by selection of a suitable GoP size according to the application, such as 16 frames for live broadcast or 8 or 4 frames for videoconferencing. The paper is organized as follows: Section 2 reviews scalable stereoscopic video coding, which was developed earlier, along with the QL concept and previous work on stereo extension of QLs. Section 3 introduces the proposed algorithm and discusses implementation and complexity issues. Section 4 provides experimental results with monocular and stereoscopic test sequences. Conclusions are drawn in Section 5. 558 ÖZBEK: Trellis-based optimization of layer extraction for..., 2. Background 2.1. Scalable stereo video coding The SMVC design [9] exploits the temporal scalability feature of the JSVM reference software [4] by sequential interleaving of the ﬁrst (right) and second (left) views in each GoP. The prediction structure, which is given in detail in [9], supports adaptive temporal or disparity-compensated prediction such that every frame in the left view uses past and future frames from its own view and the collocated frame from the ri (...truncated)