Abstract
We use Modified Discrete Cosine Transform (MDCT) to analyze and synthesize spatial parameters. MDCT in itself lacks phase information and energy conservation, which are needed by spatial parameters representation. Completing MDCT with Modified Discrete Sine Transform (MDST) into “MDCT-j*MDST” overcomes this and enables the representation in a form similar to that of DFT. And due to overlap-add in time domain, a MDST spectrum can be built perfectly from MDCT spectra of neighboring frames through matrix-vector multiplication. The matrix is heavily diagonal and keeping only a small number of its sub-diagonals is sufficient for approximation. When using MDCT based core coder in spatial audio coding, like Advanced Audio Coding (AAC), we need no separate transforming for spatial processing, cutting down significantly the computational complexity. Subjective listening tests also show that MDCT domain spatial processing has no quality impairment.








Similar content being viewed by others
References
3GPP specification Series TS 26.410 (2005) General audio codec audio processing functions; enhanced aacPlus general audio codec; floating-point ANSI-C code, http://www.3gpp.org/ftp/Specs/html-info/26-series.htm, Apr. 2005
3GPP Specification Series TS26.405 (2005) General audio codec audio processing functions; enhanced aacPlus general audio codec; encoder specification; parametric stereo part, http://www.3gpp.org/ftp/Specs/html-info/26-series.htm, Apr. 2005
Algazi VR, Duda RO, Thompson DM, Avendano C (2001) The CIPIC HRTF database. Presented at IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics
Baumgarte F, Faller C (2002a) Estimation of auditory spatial cues for binaural cue coding. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 1801–1804
Baumgarte F, Faller C (2002b) Why binaural cue coding is better than intensity stereo coding. Presented at the 112th AES Convention, Munich, Germany
Baumgarte F, Faller C (2003) Binaural cue coding—part I: psychoacoustic fundamentals and design principles. IEEE Trans Speech Audio Process 11:509–519. doi:10.1109/TSA.2003.818109
Blauert J (1983) Spatial hearing: the psychophysics of human sound localization. MIT, USA
Bosi M, Goldberg R (2003) MPEG-2 AAC. In: Introduction to digital audio coding and standards, chap. 13. Kluwer Academic, USA, pp 333–367
Bosi M, Brandenburg K, Quackenbush S, Fielder L, Akagiri K, Fuchs H, Dietz M (1997) ISO/IEC MPEG-2 advanced audio coding. J Audio Eng Soc 45(10):789–814
Breebaart J (2007) Analysis and synthesis of binaural parameters for efficient 3D audio rendering in MPEG Surround. In: IEEE International Conference on Multimedia and Expo, Beijing, China, pp 1878–1881
Breebaart J, van de Par S, Kohlrausch A (2001) Binaural processing model based on contralateral inhibition. I. Model structure. J Acoust Soc Am 110:1074–1088. doi:10.1121/1.1383297
Breebaart J, Disch S, Faller C, Herre J, Hotho G, Kjörling K, Myburg F, Neusinger M, Oomen W, Purnhagen H, Rödén J (2005a) MPEG spatial audio coding / MPEG Surround: overview and current status. Presented at the 119th AES Convention, New York
Breebaart J, van de Par S, Kohlrausch A, Schuijers E (2005b) Parametric coding of stereo audio. EURASIP J Appl Signal Process 9:1305–1322. doi:10.1155/ASP.2005.1305
Breebaart J, Hotho G, Koppens J, Schuijers E, Oomen W, van de Par S (2007) Background, concept and architecture for the recent MPEG Surround standard on multi-channel audio compression. J Audio Eng Soc 55:331–351
Breebaart J, Villemoes L, Köjrling K (2008) Binaural rendering in MPEG Surround. EURASIP J. Advances in Signal Processing, Article ID 732895
Cheng CI (2004) Method for estimating magnitude and phase in the MDCT domain. Presented at the 116th AES Convention, Berlin, Germany
Disch S, Ertel C, Faller C, Herre J, Hilpert J, Hoelzer A, Kroon P, Linzmeier K, Spenger C (2004) Spatial audio coding: next-generation efficient and compatible coding of multi-channel audio. Presented at the 117th AES Convention, San Francisco, USA
Engdegård J, Purnhagen H, Rödén J, Liljeryd L (2004) Synthetic ambience in parametric stereo coding. Presented at 116th AES Convention, Berlin, Germany
Faller C (2004) Parametric coding of spatial audio. Ph.D. Dissertation, Institut de systèmes de communication, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Faller C (2006) Parametric multichannel audio coding: synthesis of coherence cues. IEEE Trans Audio Speech Lang Process 14:299–310. doi:10.1109/TSA.2005.854105
Faller C, Baumgarte F (2001) Efficient representation of spatial audio using perceptual parameterization. Presented at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York
Faller C, Baumgarte F (2002a) Binaural cue coding: a novel and efficient representation of spatial audio. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 1841–1844
Faller C, Baumgarte F (2002b) Binaural cue coding applied to stereo and multi-channel audio compression. Presented at the 11th AES Convention, Munich, Germany
Faller C, Baumgarte F (2002c) Binaural cue coding applied to audio compression with flexible rendering. Presented at the 113th AES Convention, Los Angeles, USA
Faller C, Baumgarte F (2003) Binaural cue coding—part II: schemes and applications. IEEE Trans Speech Audio Process 11:520–531. doi:10.1109/TSA.2003.818108
Fliege NJ (1994) Modified DFT Polyphase SBC filter banks with almost perfect reconstruction. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 149–152
Gilkey R, Anderson TR (eds) (1997) Binaural and spatial hearing in real and virtual environments. Erlbaum, Mahwah, NJ
Herre J (2004) From joint stereo to spatial audio coding-recent progress and standardization. In: Proc. of the 7th Int. Conference on Digital Audio Effects, Naples, Italy, Oct. 2004, pp. 157–162
Herre J, Purnhagen H, Breebaart J, Faller C, Disch S, Kjörling K (2005) The reference model architecture for MPEG spatial audio coding. Presented at the 118th AES Convention, Barcelona, Spain
Herre J, Köjrling K, Breebaart J, Faller C, Disch S, Purnhagen H, Koppens J, Hilpert J, Rödén J, Oomen W, Linzmeier K, Chong KS (2008) MPEG Surround—the ISO/MPEG standard for efficient and compatible multi-channel audio coding. J Audio Eng Soc 56:932–955
Hotho G, Villemoes LF, Breebaart J (2008) A backward-compatible multichannel audio codec. IEEE Trans Audio Speech Lang Process 16:83–93. doi:10.1109/TASL.2007.910768
ISO/IEC JTC1/SC29/WG11 (2005) Information technology—generic coding of moving pictures and associated audio information—part 7: advanced audio coding (AAC), ISO/IEC 13818-7:2005(E)
ISO/IEC JTC1/SC 29/WG11 (2006) MPEG Audio sub-group, Text of ISO/IEC 23003-1:2006/FCD, MPEG Surround
ITU (2003) Method for the subjective assessment of intermediate quality level of coding systems, ITU-R BS.1534-1
Joris P, Yin TCT (2006) A matter of time: internal delays in binaural processing. Trends Neurosci 30:70–78. doi:10.1016/j.tins.2006.12.004
Karp T, Fliege NJ (1995) MDFT filter banks with perfect reconstruction. Presented at IEEE International Symposium on Circuits and Systems
Malvar HS (1990) Lapped transforms for efficient transform/subband coding. IEEE Trans Acoust Speech Signal Process 38:969–978. doi:10.1109/29.56057
Malvar HS (1991) Fast algorithm for modulated lapped transform. Electron Lett 27(9):775–776. doi:10.1049/el:19910482
Malvar HS (1992) Signal processing with lapped transforms. Artech House, Norwood, MA
Malvar H (1999) A modulated complex lapped transform and its applications to audio processing. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 1421–1424
Malvar HS (2003) Fast algorithm for the modulated complex lapped transform. IEEE Signal Process Lett 10:8–10. doi:10.1109/LSP.2002.806700
Malvar HS, Staelin DH (1989) The LOT: transform coding without blocking effects. IEEE Trans Acoust Speech Signal Process 37:553–559. doi:10.1109/29.17536
McAlpine D, Jiang D, Palmer AR (2001) A neural code for low-frequency sound localization in mammals. Nat Neurosci 4:396–401. doi:10.1038/86049
Mu-Huo C, Yu-Hsin H (2003) Fast IMDCT and MDCT algorithms—a matrix approach. IEEE Trans Signal Process 51:221–229. doi:10.1109/TSP.2002.806566
Munkong R, Biing-Hwang J (2008) Auditory perception and cognition. IEEE Signal Process Mag 25:98–117. doi:10.1109/MSP.2008.918418
Plogsties J, Breebaart J, Herre J, Villemoes L, Jin C, Kjörling K, Koppens J (2006) MPEG Surround binaural rendering—surround sound for mobile devices. Presented at 24th Tonmeistertagung—VDT International Convention, Leipzig, Germany
Princen J, Bradley A (1986) Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans Acoust Speech Signal Process 34:1153–1161. doi:10.1109/TASSP.1986.1164954
Princen JP, Johnson AW, Bradley AB (1987) Subband/transform coding using filter bank designs based on time domain aliasing cancellation. In: Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp 2161–2164
Quackenbush S, Herre J (2005) MPEG Surround. IEEE Multimedia 12:18–23. doi:10.1109/MMUL.2005.76
Roden J, Breebaart J, Hilpert J, Purnhagen H, Schuijers E, Koppens J, Linzmeier K, Holzer A (2007) A study of the MPEG Surround quality versus bit-rate curve. Presented at the 123rd AES Convention, New York, USA
Schuijers EGP, Oomen AWJ, den Brinker AC, Gerrits AJ (2003) Advances in parametric coding for high-quality audio. Presented at the 114th AES Convention, Amsterdam, The Netherlands
Schuijers E, Breebaart J, Purnhagen H, Engdegard J (2004) Low complexity parametric stereo coding. Presented at 116th AES Convention, Berlin, Germany
Strutt JW (1907) (Lord Rayleigh), on our perception of sound direction. Philos Mag 13:214–232
Wang Y, Vilermo M (2003) Modified discrete cosine transform—its implications for audio coding and error concealment. J Audio Eng Soc 51:52–61
Acknowledgement
This research was supported by National Science Foundation of China (grant 60832002) and MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) Support program supervised by the IITA(Institute of Information Technology Advancement) (IITA-2009-C1090-0902-0020)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A. MDFT energy conservation
As in (3.a) and (3.b), \( {c_0}, \ldots, {c_{N - 1}}\,{\text{and}}\,{s_0}, \ldots, {s_{N - 1}} \) are 2N-dimensional basis vectors for MDCT and MDST respectively. The inner products between them are
where δ(•) is the unit impulse function. They compose an orthogonal basis for 2N-dimensional real vector space. Then for a time signal \( x(n),\,n = 0, \ldots, 2N - 1 \), and its MDCT spectrum X(k) and MDST spectrum \( Y(k),\,k = 0,...,2N - 1 \), their energy satisfies
This verifies that MDFT spectral energy is N times of temporal energy.
1.2 B. MDFT time shift and phase shift
From MDFT definition in (3.c), we have when time signal x(n) has a shift d and satisfies \( x\left( {n - 2N} \right) = - x(n) \), its MDFT spectrum as
where Z(k) is MDFT spectrum of x(n) without shift. The condition \( x\left( {n - 2N} \right) = - x(n) \) parallels DFT’s requirement of periodicity but with a negative sign. For real signals and d<<2N, (A.4) is an approximation.
1.3 C. Windowed MDFT
Note X(k) and Y(k) are sine-windowed MDCT spectrum and cosine-windowed MDST spectrum respectively. Then we have
and
Take (A.4) and (A.5) as real part and imaginary part respectively,
which is 2N-point DFT with a phase shift. Moreover with \( {Z_{+} }\left( { - 1} \right) = - {Z_{-} }(0)\,{\text{and}}\,{Z_{-} }(N) = {Z_{+} }\left( {N - 1} \right) \), (A.6) leads to (5.a).
1.4 D. Properties of MDCT and MDST transform matrices
From (6), we can see each column vector of C 0 and S 1 are odd-symmetric, and each column vector of C 1 and S 0 are even-symmetric. With the help of anti-diagonal matrix J having only 1 on its anti-diagonal, the symmetries are equivalent to \( {\mathbf{J}}{{\mathbf{C}}_0} = - {{\mathbf{C}}_0},{\mathbf{J}}{{\mathbf{S}}_1} = - {{\mathbf{S}}_1}\,{\text{and}}\,{\mathbf{J}}{{\mathbf{C}}_1} = {{\mathbf{C}}_1},{\mathbf{J}}{{\mathbf{S}}_0} = {{\mathbf{S}}_0} \) respectively. From this and \( {{\mathbf{J}}^{\text{T}}}{\mathbf{J}} = {\mathbf{JJ}} = {\mathbf{I}} \), we have
which implies \( {\mathbf{S}}_0^{\text{T}}{{\mathbf{C}}_0} = {\mathbf{0}} \). And for the same reason, \( {\mathbf{S}}_1^{\text{T}}{{\mathbf{C}}_1} = {\mathbf{0}} \). For the windowed case, from the second equation of (14) W 1=JW 0 J and that W 0 and W 1 are diagonal matrices then W 0 W 1=W 1 W 0, we have
which implies \( {\mathbf{S}}_0^{\text{T}}{{\mathbf{W}}_1}{{\mathbf{W}}_0}{{\mathbf{C}}_0} = {\mathbf{0}} \). And for the same reason, \( {\mathbf{S}}_1^{\text{T}}{{\mathbf{W}}_0}{{\mathbf{W}}_1}{{\mathbf{C}}_1} = {\mathbf{0}} \). Also by similar procedure as (A.8), we have \( {\mathbf{S}}_0^{\text{T}}{{\mathbf{W}}_1}{{\mathbf{W}}_1}{{\mathbf{C}}_1} = {\mathbf{S}}_0^{\text{T}}{{\mathbf{W}}_0}{{\mathbf{W}}_0}{{\mathbf{C}}_1} \). From this and with the help of the first equation of (14) \( {{\mathbf{W}}_0}{{\mathbf{W}}_0} + {{\mathbf{W}}_1}{{\mathbf{W}}_1} = {\mathbf{I}} \), we can see
And for the same reason, \( {\mathbf{S}}_1^{\text{T}}{{\mathbf{W}}_0}{{\mathbf{W}}_0}{{\mathbf{C}}_0} = {\mathbf{S}}_1^{\text{T}}{{\mathbf{C}}_0}/2 \).
1.5 E. Properties of the conversion matrix T
As in (7.b), P is a matrix having only \( + 1, - 1, + 1, - 1, \ldots, \) on its diagonal, implying PP T=I. And with \( {{\mathbf{S}}_0} = - {{\mathbf{C}}_1}{\mathbf{P}},{{\mathbf{S}}_1} = {{\mathbf{C}}_0}{\mathbf{P}} \) in (7.b), we have \( {{\mathbf{S}}_1}{\mathbf{S}}_1^{\text{T}} = {{\mathbf{C}}_0}{\mathbf{C}}_0^{\text{T}},\,{{\mathbf{S}}_0}{\mathbf{S}}_0^{\text{T}} = {{\mathbf{C}}_1}{\mathbf{C}}_1^{\text{T}},\,{{\mathbf{S}}_0}{\mathbf{S}}_1^{\text{T}} = - {{\mathbf{C}}_1}{\mathbf{C}}_0^{\text{T}},\,{{\mathbf{S}}_1}{\mathbf{S}}_0^{\text{T}} = - {{\mathbf{C}}_0}{\mathbf{C}}_1^{\text{T}} \). With the help of \( {\mathbf{C}}_0^{\text{T}}{{\mathbf{C}}_0} + {\mathbf{C}}_1^{\text{T}}{{\mathbf{C}}_1} = N{\mathbf{I}}\,{\text{and}}\,{{\mathbf{C}}_1}{\mathbf{C}}_0^T = {{\mathbf{C}}_0}{\mathbf{C}}_1^T = {\mathbf{0}} \) in (7.a), the conversion matrix defined in (10.b) is orthogonal, or
Rights and permissions
About this article
Cite this article
Chen, S., Xiong, N., Hyuk Park, J. et al. Spatial parameters for audio coding: MDCT domain analysis and synthesis. Multimed Tools Appl 48, 225–246 (2010). https://doi.org/10.1007/s11042-009-0326-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0326-4