Progressive Syntax-Rich Coding of Multichannel Audio Sources
EURASIP Journal on Applied Signal Processing 2003:10, 980–992
c 2003 Hindawi Publishing Corporation
Progressive Syntax-Rich Coding of Multichannel
Audio Sources
Dai Yang
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California,
Los Angeles, CA 90089-2564, USA
Email:
Hongmei Ai
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California,
Los Angeles, CA 90089-2564, USA
Email:
Chris Kyriakakis
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California,
Los Angeles, CA 90089-2564, USA
Email:
C.-C. Jay Kuo
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California,
Los Angeles, CA 90089-2564, USA
Email:
Received 6 May 2002 and in revised form 5 March 2003
Being able to transmit the audio bitstream progressively is a highly desirable property for network transmission. MPEG-4 version 2
audio supports fine grain bit rate scalability in the generic audio coder (GAC). It has a bit-sliced arithmetic coding (BSAC) tool,
which provides scalability in the step of 1 Kbps per audio channel. There are also several other scalable audio coding methods,
which have been proposed in recent years. However, these scalable audio tools are only available for mono and stereo audio
material. Little work has been done on progressive coding of multichannel audio sources. MPEG advanced audio coding (AAC)
is one of the most distinguished multichannel digital audio compression systems. Based on AAC, we develop in this work a
progressive syntax-rich multichannel audio codec (PSMAC). It not only supports fine grain bit rate scalability for the multichannel
audio bitstream but also provides several other desirable functionalities. A formal subjective listening test shows that the proposed
algorithm achieves an excellent performance at several different bit rates when compared with MPEG AAC.
Keywords and phrases: multichannel audio, progressive coding, Karhunen-Loéve transform, successive quantization, PSMAC.
1.
INTRODUCTION
Multichannel audio technologies have become much more
mature these days, partially pushed by the need of the film
industry and home entertainment systems. Starting from the
monophonic technology, new systems, such as stereophonic,
quadraphonic, 5.1 channels, and 10.2 channels, are penetrating into the market very quickly. Compared with the mono
or stereo sound, multichannel audio provides end users a
more compelling experience and becomes more appealing
to music producers. As a result, an efficient coding scheme
for multichannel audio storage and transmission is in great
demand. Among several existing multichannel audio com-
pression algorithms, Dolby AC-3 and MPEG advanced audio coding (AAC) [1, 2, 3, 4] are two most prevalent perceptual digital audio coding systems. Both of them can provide
perceptually indistinguishable audio quality at the bit rate of
64 Kbps/ch.
In spite of their success, they can only provide bitstreams
with a fixed bit rate, which is specified during the encoding phase. When this kind of bitstream is transmitted over
variable bandwidth networks, the receiver can either successfully decode the full bitstream or ask the encoder to retransmit a bitstream with a lower bit rate. The best solution to
this problem is to develop a scalable compression algorithm
to transmit and decode the audio content in an embedded
Progressive Syntax-Rich Coding of Multichannel Audio Sources
manner. To be more specific, a bitstream generated by a
scalable coding scheme consists of several partial bitstreams,
each of which can be decoded on their own in a meaningful way. Therefore, transmission and decoding of a subset of
the total bitstream will result in a valid decodable signal at a
lower bit rate and quality. This capability offers a significant
advantage in transmitting contents over networks with variable channel capacity and heterogeneous access bandwidth.
MPEG-4 version 2 audio coding supports fine grain bit
rate scalablility [5, 6, 7, 8, 9] in its generic audio coder (GAC).
It has a bit-sliced arithmetic coding (BSAC) tool, which provides scalability in the step of 1 Kbps per audio channel for
mono or stereo audio material. Several other scalable mono
or stereo audio coding algorithms [10, 11, 12] were proposed
in recent years. However, not much work has been done on
progressive coding of multichannel audio sources. In this
work, we propose a progressive syntax-rich multichannel audio codec (PSMAC) based on MPEG AAC. In PSMAC, the
interchannel redundancy inherent in original physical channels is first removed in the preprocessing stage by using the
Karhunen-Loéve transform (KLT). Then, most coding blocks
in the AAC main profile encoder are employed to generate
spectral coefficients. Finally, a progressive transmission strategy and a context-based QM-coder are adopted to obtain the
fully quality-scalable multichannel audio bitstream. The PSMAC system not only supports fine-grain bit rate scalability for the multichannel audio bitstream, but also provides
several other desirable functionalities, such as random access
and channel enhancement, which have not been supported
by other existing multichannel audio codecs (MAC).
Moreover, compared with the BSAC tool provided in
MPEG-4 version 2 and most of the other scalable audio
coding tools, a more sophisticated progressive transmission
strategy is employed in PSMAC. PSMAC does not only encode spectral coefficients from MSB to LSB and from low to
high frequency so that the decoder can reconstruct these coefficients more and more precisely with an increasing bandwidth as the receiver collects more and more bits from the
bitstream, but also utilizes the psychoacoustic model to control the subband transmission sequence so that the most sensitive frequency area is more precisely reconstructed. In this
way, bits used to encode coefficients in those nonsensitive frequency area can be saved and used to encode coefficients in
the sensitive frequency area. As a result of this subband selection strategy, a perceptually more appealing audio can be
reconstructed by PSMAC, especially at very low bit rates such
as 16 Kbps/ch. The side information required to encode the
subband transmission sequence is carefully handled in our
implementation so that the overall overhead will not have
significant impact on the audio quality even at very low bit
rates. Note that Shen et al. [12] proposed a subband selection
rule to achieve progressive coding. However, Shen’s scheme
demands a large amount of overhead in coding the selection
order.
Experimental results show that, when compared with
MPEG AAC, the decoded multichannel audio generated by
the proposed PSMAC’s mask-to-noise-ratio (MNR) progressive mode has comparable quality at high bit rates, such as
981
64 Kbps/ch or 48 Kbps/ch, and much better quality at low bit
rates, such as 32 Kbps/ch or 16 Kbps/ch. W (...truncated)