Optimally truncating head-related impulse response by dynamic programming with its applications

You, Shingchern D.; Chen, Woei-Kae

doi:10.1007/s11042-012-1234-6

Optimally truncating head-related impulse response by dynamic programming with its applications

Published: 13 September 2012

Volume 70, pages 2167–2188, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shingchern D. You¹ &
Woei-Kae Chen¹

161 Accesses
3 Citations
Explore all metrics

Abstract

We propose a method to optimally truncate the head-related impulse responses (HRIRs) in this paper. The truncated HRIR consists of a portion of the original HRIR and a flat line. An algorithm based on dynamic programming is used to optimally select the portions of the original HRIRs and the constants of the flat lines to minimize the modeling errors. The truncated HRIRs can be used to reproduce multi-channel sound for headphones with a significantly lower computational cost. The proposed method is compared with another approximation method, the CAPZ (Common-Acoustical-Pole and Zero) approach. The experimental results show that the proposed method yields lower composition as well as modeling errors for the same amount of computation. Compared with the direct implementation, the proposed approach requires about 35 % of the computational cost while maintaining acceptable composition errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends in Acquisition of Individual Head-Related Transfer Functions

A multiple model high-resolution head-related impulse response database for aided and unaided ears

Article Open access 13 February 2019

Joachim Thiemann & Steven van de Par

A Splicing Interpolation Method for Head-Related Transfer Function

References

Advanced Television Systems Committee (1995) Digital Audio Compression Standard (AC-3), Doc. A/52
Available at http://www.lynnemusic.com/surround.html
Blaurt J (1997) Spatial hearing – the psychophysics of human sound localization, Revisedth edn. MIT press, Cambridge
Google Scholar
Blommer MA, Wakefield GH (1997) Pole-zero approximations for head-related transfer functions using a logarithmic error criterion, IEEE Trans. Speech and Audio Processing 5(3):278–287
Article Google Scholar
Brown CP, Duda RO (1998) A structural model for binaural sound synthesis. IEEE Trans Speech Audio Process 6(5):476–488
Article Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA, USA
MATH Google Scholar
DTS standard is not open to public; however, introductory materials are available at http://dts.com
Durant EA, Wakefield GH (2002) Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs. IEEE Trans Speech and Audio Processing 10(1):18–27
Article Google Scholar
Gardner WG, Martin KD (1994) HRTF measurements of a KERMAR dummy-head microphone, MIT Media Lab. Available at (http://sound.media.mit.edu/resources/KEMAR.html)
Gardner WG, Martin KD (1995) HRTF measurements of a KERMAR. J Acoust Soc Am 97(6):3907–3908
Article Google Scholar
Haneda Y, Makino S, Kaneda Y, Kitawaki N (1999) Common-acoustical-pole and zero modeling of head-related transfer function. IEEE Trans Speech and Audio Processing 7(2):188–196
Article Google Scholar
Huang S, Park Y (2008) Interpretations on principal components analysis of head-related impulse responses in the median plane. J Acoust Soc Am 123(4):1–7
MathSciNet Google Scholar
Listen HRTF DATABASE by IRCAM and AKG (2003) available at http://recherche.ircam.fr/equipes/salles/listen/index.html
ISO/IEC (2005) Information Technology – Coding of Audio-visual Objects, Part 3: Audio, IS 14496-3
Kulkarni A, Colburn HS (2004) Infinite-impulse-response models of the head-related transfer function. J Acoust Soc Am 115(4):1714–1728
Article Google Scholar
Mackenzie J, Huopaniemi J, Välimäki V, Kale I (1997) Low-order modeling of head-related transfer functions using balanced model truncation. IEEE Signal Processing Lett 4(2):39–41
Article Google Scholar
Sakamoto N, Kobayashi W, Onoye T, Shirakawa I (2003) Single DSP implementation of real time 3D sound synthesis algorithm. Journal of Circuits, Systems, and Computers 12(1):55–73
Article Google Scholar
Shen Y-C, You SD (2003) Rendering spatial sound on headsets for five-channel audio. Proc. of the Fourth Int’l Conf. on Info., Com. and Signal Proc. and Fourth Pacific-Rim Conf. on Multimedia (ICICS-PCM 2003), Singapore, 1–5
Yen C-H, Lin Y-S, Wu B-F (2007) An efficient implementation of a low-complexity MP3 algorithm with stream cipher. Multimedia Tools and Applications 35(3):335–355
Article Google Scholar
You SD, Chen W-K (2008) Efficient quantization algorithm for real-time MP-3 encoders. Multimedia Tools and Applications 40(3):341–359
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-Hsiao East Rd., Taipei, Taiwan
Shingchern D. You & Woei-Kae Chen

Authors

Shingchern D. You
View author publications
You can also search for this author in PubMed Google Scholar
Woei-Kae Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shingchern D. You.

Appendix A. Computational complexity of exhaustive search

This appendix briefly discusses the lower-bound time complexity of using an exhaustive search to find optimal section IIs, i.e., the optimal starting points and lengths (N _i ^II) of section IIs. Let the number of impulse responses be F, the number of coefficients in a response be N, \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \), and, for simplicity, N = M (M is proportional to N in practice). An allocation of M is an assignment of the values of N ₁ ^II … N _F ^II satisfying the constraint that \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \). For example, if F = 5 and M = 100, a possible allocation is N ₁ ^II = 1, N ₂ ^II = 1, N ₂ ^II = 1, N ₃ ^II = 1, N ₄ ^II = 96. In this case, since N ₁ ^II = 1, the section II of the first impulse response has a total of N possible starting points. An exhaustive search must compute the modeling errors of all possible distinct allocations and starting points, and record the one with the smallest error. We will show that there exists at least Ω(M ^F−1) distinct allocations, and each of the distinct allocation has at least Ω(M ^F−1) distinct combinations of starting points. Therefore, an exhaustive search must compute \( \varOmega ({M^{{F - 1}}} \times {M^{{F - 1}}}) = \varOmega ({M^{{2F - 2}}}) \) distinct modeling errors. Since computing a particular modeling error requires Ω(N) = Ω(M) time (Eq. 10 or Eq. 11). The complexity of an exhaustive search is Ω(M ^2F−1).

Let’s consider the number of distinct allocations first. Since we are concerned with the lower bound, we do not need to calculate all possible distinct allocations. Instead, we consider a subset of all possible distinct allocations, one with the restriction that \( 1 \leqslant N_1^{{II}} \leqslant \frac{M}{F} \), …, \( 1 \leqslant N_{{F - 1}}^{{II}} \leqslant \frac{M}{F} \). Under this restriction, N ₁ ^II have \( \frac{M}{F} \) different possible values; so are N ₂ ^II …N _F−1 ^II . Note that this restriction requires \( F - 1 \leqslant \sum\limits_{{i = 1}}^{{F - 1}} {N_i^{{II}}} \leqslant \frac{{(F - 1)}}{F}M \), and therefore there must exist a value of N _F ^II that satisfies \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \). In other words, all combinations of possible values of N ₁ ^II…N _F−1 ^II are legal and are distinct allocations. Therefore, there exist \( \varOmega \left( {\frac{{{M^{{F - 1}}}}}{{{F^{{F - 1}}}}}} \right) \) distinct allocations. Since F is typically a constant (e.g., 5), F ^F−1 is also a constant (e.g., 5 ⁴ = 625). The number of distinct allocations can be simplified as Ω(M ^F−1).

We now calculate the number of possible starting points for each distinct allocation. Normally, for the i ^th impulse response, depending on the value of N _i ^II, the number of distinct starting points can be as small as 1 (when N _i ^II = N), and as large as N (when N _i ^II = 1). However, for the first (F−1) impulse responses, we have made the restriction that \( 1 \leqslant N_i^{{II}} \leqslant \frac{M}{F} \). When N _i ^II = 1 and \( N_i^{{II}} = \frac{M}{F} \), the number of distinct starting points are N and \( N - \frac{M}{F} \), respectively. Thus, each of the first (F−1) impulse response has at least Ω(M) distinct starting points. Therefore, the first (F−1) impulse responses alone have Ω(M ^F−1) possible combinations of starting points. Note that we did not count the starting points of the F ^th impulse response. This is safe because we are calculating a lower bound.

Since there are Ω(M ^F−1) distinct allocations and each allocation has Ω(M ^F−1) distinct combination of starting points, an exhaustive search must compute \( \varOmega ({M^{{F - 1}}} \times {M^{{F - 1}}}) = \varOmega ({M^{{2F - 2}}}) \) different possible modeling errors. The modeling error of each allocation can be calculated in Ω(N) = Ω(M) time according to Eq. 10 or Eq. 11. Therefore, the complexity of an exhaustive search becomes \( \varOmega ({M^{{2F - 2}}} \times M) = \varOmega ({M^{{2F - 1}}}) \). If F = 5 (five-channel audio) and M = 220, the number of computations is at least proportional to \( {M^9} = 1.2 \times {10^{{21}}} \). For a computer that can compute one square and one addition in Eq. 10 in 10⁻⁸ s, it would take 3.8 × 10⁵ years to find an optimal solution, which is clearly impractical.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, S.D., Chen, WK. Optimally truncating head-related impulse response by dynamic programming with its applications. Multimed Tools Appl 70, 2167–2188 (2014). https://doi.org/10.1007/s11042-012-1234-6

Download citation

Published: 13 September 2012
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11042-012-1234-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimally truncating head-related impulse response by dynamic programming with its applications

Abstract

Access this article

Similar content being viewed by others

Trends in Acquisition of Individual Head-Related Transfer Functions

A multiple model high-resolution head-related impulse response database for aided and unaided ears

A Splicing Interpolation Method for Head-Related Transfer Function

References

Author information

Authors and Affiliations

Corresponding author

Appendix A. Computational complexity of exhaustive search

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimally truncating head-related impulse response by dynamic programming with its applications

Abstract

Access this article

Similar content being viewed by others

Trends in Acquisition of Individual Head-Related Transfer Functions

A multiple model high-resolution head-related impulse response database for aided and unaided ears

A Splicing Interpolation Method for Head-Related Transfer Function

References

Author information

Authors and Affiliations

Corresponding author

Appendix A. Computational complexity of exhaustive search

Appendix A. Computational complexity of exhaustive search

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation