Skip to main content

Advertisement

Log in

Optimally truncating head-related impulse response by dynamic programming with its applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a method to optimally truncate the head-related impulse responses (HRIRs) in this paper. The truncated HRIR consists of a portion of the original HRIR and a flat line. An algorithm based on dynamic programming is used to optimally select the portions of the original HRIRs and the constants of the flat lines to minimize the modeling errors. The truncated HRIRs can be used to reproduce multi-channel sound for headphones with a significantly lower computational cost. The proposed method is compared with another approximation method, the CAPZ (Common-Acoustical-Pole and Zero) approach. The experimental results show that the proposed method yields lower composition as well as modeling errors for the same amount of computation. Compared with the direct implementation, the proposed approach requires about 35 % of the computational cost while maintaining acceptable composition errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Advanced Television Systems Committee (1995) Digital Audio Compression Standard (AC-3), Doc. A/52

  2. Available at http://www.lynnemusic.com/surround.html

  3. Blaurt J (1997) Spatial hearing – the psychophysics of human sound localization, Revisedth edn. MIT press, Cambridge

    Google Scholar 

  4. Blommer MA, Wakefield GH (1997) Pole-zero approximations for head-related transfer functions using a logarithmic error criterion, IEEE Trans. Speech and Audio Processing 5(3):278–287

    Article  Google Scholar 

  5. Brown CP, Duda RO (1998) A structural model for binaural sound synthesis. IEEE Trans Speech Audio Process 6(5):476–488

    Article  Google Scholar 

  6. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  7. DTS standard is not open to public; however, introductory materials are available at http://dts.com

  8. Durant EA, Wakefield GH (2002) Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs. IEEE Trans Speech and Audio Processing 10(1):18–27

    Article  Google Scholar 

  9. Gardner WG, Martin KD (1994) HRTF measurements of a KERMAR dummy-head microphone, MIT Media Lab. Available at (http://sound.media.mit.edu/resources/KEMAR.html)

  10. Gardner WG, Martin KD (1995) HRTF measurements of a KERMAR. J Acoust Soc Am 97(6):3907–3908

    Article  Google Scholar 

  11. Haneda Y, Makino S, Kaneda Y, Kitawaki N (1999) Common-acoustical-pole and zero modeling of head-related transfer function. IEEE Trans Speech and Audio Processing 7(2):188–196

    Article  Google Scholar 

  12. Huang S, Park Y (2008) Interpretations on principal components analysis of head-related impulse responses in the median plane. J Acoust Soc Am 123(4):1–7

    MathSciNet  Google Scholar 

  13. Listen HRTF DATABASE by IRCAM and AKG (2003) available at http://recherche.ircam.fr/equipes/salles/listen/index.html

  14. ISO/IEC (2005) Information Technology – Coding of Audio-visual Objects, Part 3: Audio, IS 14496-3

  15. Kulkarni A, Colburn HS (2004) Infinite-impulse-response models of the head-related transfer function. J Acoust Soc Am 115(4):1714–1728

    Article  Google Scholar 

  16. Mackenzie J, Huopaniemi J, Välimäki V, Kale I (1997) Low-order modeling of head-related transfer functions using balanced model truncation. IEEE Signal Processing Lett 4(2):39–41

    Article  Google Scholar 

  17. Sakamoto N, Kobayashi W, Onoye T, Shirakawa I (2003) Single DSP implementation of real time 3D sound synthesis algorithm. Journal of Circuits, Systems, and Computers 12(1):55–73

    Article  Google Scholar 

  18. Shen Y-C, You SD (2003) Rendering spatial sound on headsets for five-channel audio. Proc. of the Fourth Int’l Conf. on Info., Com. and Signal Proc. and Fourth Pacific-Rim Conf. on Multimedia (ICICS-PCM 2003), Singapore, 1–5

  19. Yen C-H, Lin Y-S, Wu B-F (2007) An efficient implementation of a low-complexity MP3 algorithm with stream cipher. Multimedia Tools and Applications 35(3):335–355

    Article  Google Scholar 

  20. You SD, Chen W-K (2008) Efficient quantization algorithm for real-time MP-3 encoders. Multimedia Tools and Applications 40(3):341–359

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shingchern D. You.

Appendix A. Computational complexity of exhaustive search

Appendix A. Computational complexity of exhaustive search

This appendix briefly discusses the lower-bound time complexity of using an exhaustive search to find optimal section IIs, i.e., the optimal starting points and lengths (N i II) of section IIs. Let the number of impulse responses be F, the number of coefficients in a response be N, \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \), and, for simplicity, N = M (M is proportional to N in practice). An allocation of M is an assignment of the values of N 1 IIN F II satisfying the constraint that \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \). For example, if F = 5 and M = 100, a possible allocation is N 1 II = 1, N 2 II = 1, N 2 II = 1, N 3 II = 1, N 4 II = 96. In this case, since N 1 II = 1, the section II of the first impulse response has a total of N possible starting points. An exhaustive search must compute the modeling errors of all possible distinct allocations and starting points, and record the one with the smallest error. We will show that there exists at least Ω(M F−1) distinct allocations, and each of the distinct allocation has at least Ω(M F−1) distinct combinations of starting points. Therefore, an exhaustive search must compute \( \varOmega ({M^{{F - 1}}} \times {M^{{F - 1}}}) = \varOmega ({M^{{2F - 2}}}) \) distinct modeling errors. Since computing a particular modeling error requires Ω(N) = Ω(M) time (Eq. 10 or Eq. 11). The complexity of an exhaustive search is Ω(M 2F−1).

Let’s consider the number of distinct allocations first. Since we are concerned with the lower bound, we do not need to calculate all possible distinct allocations. Instead, we consider a subset of all possible distinct allocations, one with the restriction that \( 1 \leqslant N_1^{{II}} \leqslant \frac{M}{F} \), …, \( 1 \leqslant N_{{F - 1}}^{{II}} \leqslant \frac{M}{F} \). Under this restriction, N 1 II have \( \frac{M}{F} \) different possible values; so are N 2 IIN F−1 II . Note that this restriction requires \( F - 1 \leqslant \sum\limits_{{i = 1}}^{{F - 1}} {N_i^{{II}}} \leqslant \frac{{(F - 1)}}{F}M \), and therefore there must exist a value of N F II that satisfies \( M = \sum\nolimits_{{i = 1}}^F {N_i^{{II}}} \). In other words, all combinations of possible values of N 1 IIN F−1 II are legal and are distinct allocations. Therefore, there exist \( \varOmega \left( {\frac{{{M^{{F - 1}}}}}{{{F^{{F - 1}}}}}} \right) \) distinct allocations. Since F is typically a constant (e.g., 5), F F−1 is also a constant (e.g., 5 4 = 625). The number of distinct allocations can be simplified as Ω(M F−1).

We now calculate the number of possible starting points for each distinct allocation. Normally, for the i th impulse response, depending on the value of N i II, the number of distinct starting points can be as small as 1 (when N i II = N), and as large as N (when N i II = 1). However, for the first (F−1) impulse responses, we have made the restriction that \( 1 \leqslant N_i^{{II}} \leqslant \frac{M}{F} \). When N i II = 1 and \( N_i^{{II}} = \frac{M}{F} \), the number of distinct starting points are N and \( N - \frac{M}{F} \), respectively. Thus, each of the first (F−1) impulse response has at least Ω(M) distinct starting points. Therefore, the first (F−1) impulse responses alone have Ω(M F−1) possible combinations of starting points. Note that we did not count the starting points of the F th impulse response. This is safe because we are calculating a lower bound.

Since there are Ω(M F−1) distinct allocations and each allocation has Ω(M F−1) distinct combination of starting points, an exhaustive search must compute \( \varOmega ({M^{{F - 1}}} \times {M^{{F - 1}}}) = \varOmega ({M^{{2F - 2}}}) \) different possible modeling errors. The modeling error of each allocation can be calculated in Ω(N) = Ω(M) time according to Eq. 10 or Eq. 11. Therefore, the complexity of an exhaustive search becomes \( \varOmega ({M^{{2F - 2}}} \times M) = \varOmega ({M^{{2F - 1}}}) \). If F = 5 (five-channel audio) and M = 220, the number of computations is at least proportional to \( {M^9} = 1.2 \times {10^{{21}}} \). For a computer that can compute one square and one addition in Eq. 10 in 10−8 s, it would take 3.8 × 105 years to find an optimal solution, which is clearly impractical.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, S.D., Chen, WK. Optimally truncating head-related impulse response by dynamic programming with its applications. Multimed Tools Appl 70, 2167–2188 (2014). https://doi.org/10.1007/s11042-012-1234-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1234-6

Keywords

Navigation