Abstract
Many perceptual models for audio reconstruction have been proposed to create the virtual sound, but the direction of the virtual sound maybe deviate from the desired direction due to the distortion of binaural cues. In this paper, a binaural cues’ equation for real sound and virtual one reproduced by dual loudspeakers is established to derive weight vectors based on the head-related transfer function (HRTF). After being filtered by the weight vectors, sound signals emitted from the loudspeakers can deliver an accurate spatial impression to the listener. However, the HRTFs change with listeners, by which the weight vectors calculated also vary from person to person. Therefore, a radial basis function neural network (RBFNN) is designed to personalize weight vectors for each specific listener. Compared with the three methods including vector base amplitude panning (VBAP), the HRTF-based panning (HP) and the band-based panning (BP), the method in this paper can reproduce binaural cues more accurately, and subjective test also indicates that there is no significant difference in perception between real sound and virtual sound based on the proposed methods.
This work is supported by National Nature Science Foundation of China (No. 61671335); Technological Innovation Major Project of Hubei Province (No. 2017AAA123).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wenzel, E.M., Arruda, M., Kistler, D.J., et al.: Localization using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 94(1), 111 (1993)
Zotkin, D.N., Duraiswami, R., Davis, L.S.: Rendering localized spatial audio in a virtual auditory space. IEEE Trans. Multimed. 6(4), 553–564 (2004)
Poletti, M.A.: Three-dimensional surround sound systems based on spherical harmonics. J. Audio Eng. Soc. 53(11), 1004–1025 (2005)
Ward, D.B., Abhayapala, T.D., Member, S.: Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 9, 697–707 (2001)
Pulkki, V.: Localization of amplitude-panned virtual sources I: stereophonic panning. Audio Eng. Soc. 49(9), 739–752 (2001)
Pulkki, V., Karjalainen, M.: Multichannel audio rendering using amplitude panning [DSP applications]. Signal Process. Mag. IEEE 25(3), 118–122 (2008)
Pulkki, V.: Uniform spreading of amplitude panned virtual sources. In: 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 187–190. IEEE (2002)
Blauert, J., Butler, R.A.: Spatial hearing: the psychophysics of human sound localization by Jens Blauert. J. Acoust. Soc. Am. 77(1), 334–335 (1996)
Macpherson, E.A.: A computer model of binaural localization for stereo image measurement. J. Audio Eng. Soc. 39(9), 604–622 (1989)
Breebaart, J.: Comparison of interaural intensity differences evoked by real and phantom sources. J. Audio Eng. Soc. 61(11), 850–859 (2013)
Choi, T., Park, Y., Youn, D., et al.: Virtual sound rendering in a stereophonic loudspeaker setup. IEEE Trans. Audio Speech Lang. Process. 19(7), 1962–1974 (2011)
Laitinen, M.V., Vilkamo, J., Kai, J., et al.: Gain normalization in amplitude panning as a function of frequency and room reverberance. In: International Conference: Spatial Sound, pp. 85–94. AES (2014)
Pulkki, V., Hirvonen, T.: Localization of virtual sources in multichannel audio reproduction. IEEE Trans. Speech Audio Process. 13(1), 105–119 (2004)
Algazi, V.R., Duda, R.O., Thompson, D.M., et al.: The CIPIC HRTF database. In: 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 99–102. IEEE (2001)
Gardner, W.: HRTF measurement of a KEMAR. J. Acoust. Soc. Am. 97, 3907–3908 (1995)
Wang, J., Wang, X., Tu, W., Chen, J., Wu, T., Ke, S.: The analysis for binaural signal’s characteristics of a real source and corresponding virtual sound image. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10736, pp. 626–633. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77383-4_61
Majdak, P., Masiero, B., Fels, J.: Sound localization in individualized and non-individualized crosstalk cancellation systems. J. Acoust. Soc. Am. 133(4), 2055–2068 (2013)
Grijalva, F., Martini, L., Florencio, D., et al.: A manifold learning approach for personalizing HRTFs from anthropometric features. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 559–570 (2016)
MingFang: Application of minimum-phase approximation on the signal processing of virtual auditory display. A dissertation of the degree of Master, South China University of Technology, Guangzhou, China (2012)
Breebaart, J., Steven, V.D.P., Kohlrausch, A., et al.: Parametric coding of stereo audio. EURASIP J. Adv. Signal Process. 2005(9), 1–18 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, J., Tu, W., Zhang, X., Yang, W., Zhai, S., Shen, C. (2018). A Sound Image Reproduction Model Based on Personalized Weight Vectors. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-00767-6_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)