Progressive Syntax-Rich Coding of Multichannel Audio Sources (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1155%2FS1110865703304044.pdf

Progressive Syntax-Rich Coding of Multichannel Audio Sources

EURASIP Journal on Applied Signal Processing 2003:10, 980–992 c 2003 Hindawi Publishing Corporation Progressive Syntax-Rich Coding of Multichannel Audio Sources Dai Yang Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2564, USA Email: Hongmei Ai Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2564, USA Email: Chris Kyriakakis Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2564, USA Email: C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2564, USA Email: Received 6 May 2002 and in revised form 5 March 2003 Being able to transmit the audio bitstream progressively is a highly desirable property for network transmission. MPEG-4 version 2 audio supports fine grain bit rate scalability in the generic audio coder (GAC). It has a bit-sliced arithmetic coding (BSAC) tool, which provides scalability in the step of 1 Kbps per audio channel. There are also several other scalable audio coding methods, which have been proposed in recent years. However, these scalable audio tools are only available for mono and stereo audio material. Little work has been done on progressive coding of multichannel audio sources. MPEG advanced audio coding (AAC) is one of the most distinguished multichannel digital audio compression systems. Based on AAC, we develop in this work a progressive syntax-rich multichannel audio codec (PSMAC). It not only supports fine grain bit rate scalability for the multichannel audio bitstream but also provides several other desirable functionalities. A formal subjective listening test shows that the proposed algorithm achieves an excellent performance at several diﬀerent bit rates when compared with MPEG AAC. Keywords and phrases: multichannel audio, progressive coding, Karhunen-Loéve transform, successive quantization, PSMAC. 1. INTRODUCTION Multichannel audio technologies have become much more mature these days, partially pushed by the need of the film industry and home entertainment systems. Starting from the monophonic technology, new systems, such as stereophonic, quadraphonic, 5.1 channels, and 10.2 channels, are penetrating into the market very quickly. Compared with the mono or stereo sound, multichannel audio provides end users a more compelling experience and becomes more appealing to music producers. As a result, an eﬃcient coding scheme for multichannel audio storage and transmission is in great demand. Among several existing multichannel audio com- pression algorithms, Dolby AC-3 and MPEG advanced audio coding (AAC) [1, 2, 3, 4] are two most prevalent perceptual digital audio coding systems. Both of them can provide perceptually indistinguishable audio quality at the bit rate of 64 Kbps/ch. In spite of their success, they can only provide bitstreams with a fixed bit rate, which is specified during the encoding phase. When this kind of bitstream is transmitted over variable bandwidth networks, the receiver can either successfully decode the full bitstream or ask the encoder to retransmit a bitstream with a lower bit rate. The best solution to this problem is to develop a scalable compression algorithm to transmit and decode the audio content in an embedded Progressive Syntax-Rich Coding of Multichannel Audio Sources manner. To be more specific, a bitstream generated by a scalable coding scheme consists of several partial bitstreams, each of which can be decoded on their own in a meaningful way. Therefore, transmission and decoding of a subset of the total bitstream will result in a valid decodable signal at a lower bit rate and quality. This capability oﬀers a significant advantage in transmitting contents over networks with variable channel capacity and heterogeneous access bandwidth. MPEG-4 version 2 audio coding supports fine grain bit rate scalablility [5, 6, 7, 8, 9] in its generic audio coder (GAC). It has a bit-sliced arithmetic coding (BSAC) tool, which provides scalability in the step of 1 Kbps per audio channel for mono or stereo audio material. Several other scalable mono or stereo audio coding algorithms [10, 11, 12] were proposed in recent years. However, not much work has been done on progressive coding of multichannel audio sources. In this work, we propose a progressive syntax-rich multichannel audio codec (PSMAC) based on MPEG AAC. In PSMAC, the interchannel redundancy inherent in original physical channels is first removed in the preprocessing stage by using the Karhunen-Loéve transform (KLT). Then, most coding blocks in the AAC main profile encoder are employed to generate spectral coeﬃcients. Finally, a progressive transmission strategy and a context-based QM-coder are adopted to obtain the fully quality-scalable multichannel audio bitstream. The PSMAC system not only supports fine-grain bit rate scalability for the multichannel audio bitstream, but also provides several other desirable functionalities, such as random access and channel enhancement, which have not been supported by other existing multichannel audio codecs (MAC). Moreover, compared with the BSAC tool provided in MPEG-4 version 2 and most of the other scalable audio coding tools, a more sophisticated progressive transmission strategy is employed in PSMAC. PSMAC does not only encode spectral coeﬃcients from MSB to LSB and from low to high frequency so that the decoder can reconstruct these coeﬃcients more and more precisely with an increasing bandwidth as the receiver collects more and more bits from the bitstream, but also utilizes the psychoacoustic model to control the subband transmission sequence so that the most sensitive frequency area is more precisely reconstructed. In this way, bits used to encode coeﬃcients in those nonsensitive frequency area can be saved and used to encode coeﬃcients in the sensitive frequency area. As a result of this subband selection strategy, a perceptually more appealing audio can be reconstructed by PSMAC, especially at very low bit rates such as 16 Kbps/ch. The side information required to encode the subband transmission sequence is carefully handled in our implementation so that the overall overhead will not have significant impact on the audio quality even at very low bit rates. Note that Shen et al. [12] proposed a subband selection rule to achieve progressive coding. However, Shen’s scheme demands a large amount of overhead in coding the selection order. Experimental results show that, when compared with MPEG AAC, the decoded multichannel audio generated by the proposed PSMAC’s mask-to-noise-ratio (MNR) progressive mode has comparable quality at high bit rates, such as 981 64 Kbps/ch or 48 Kbps/ch, and much better quality at low bit rates, such as 32 Kbps/ch or 16 Kbps/ch. W (...truncated)