Abstract
The head-related transfer function (HRTF) can be considered as some kind of filter that describes how a sound from an arbitrary spatial direction transfers to the listener’s eardrums. HRTF can be used to synthesize vivid virtual 3D sound that seems to come from any spatial location, which makes it play an important role in the 3D audio technology. However, the complexity and variation of auditory cues inherent in HRTF make it difficult to set up an accurate mathematical model with the conventional methods. In this paper, we put forward an HRTF representation modeling based on convolutional auto-encoder (CAE), which is some type of auto-encoder that contains convolutional layers in the encoder part and deconvolution layers in the decoder part. The experimental evaluation on the ARI HRTF database shows that the proposed model provides very good results on dimensionality reduction of HRTF.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ari hrtf database homepage. http://www.kfs.oeaw.ac.at/hrtf. Accessed 4 July 2019
Baumgartner, R., Majdak, P., Laback, B.: Modeling sound-source localization in sagittal planes for human listeners. J. Acoust. Soc. Am. 140(4), 2456 (2016). https://doi.org/10.1121/1.4964753
Blommer, M., Wakefield, G.: Pole-zero approximations for head-related transfer functions using a logarithmic error criterion. IEEE Trans. Speech Audio Process. 5(3), 278–287 (1997)
Chen, M.C., Hsieh, S.F.: Common acoustical-poles/zeros modeling for 3D sound processing. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 785–788. IEEE Signal Processing Society (2000)
Fink, K.J., Ray, L.: Individualization of head related transfer functions using principal component analysis. Appl. Acoust. 87, 162–173 (2015)
Grais, E.M., Plumbley, M.D.: Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GLOBALSIP 2017), pp. 1265–1269. IEEE (2017). https://doi.org/10.1109/GlobalSIP.2017.8309164
Grijalva, F., Martini, L., Florencio, D., Goldenstein, S.: A manifold learning approach for personalizing HRTFs from anthropometric features. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 559–570 (2016)
Grijalva, F., Martini, L.C., Florencio, D., Goldenstein, S.: Interpolation of head-related transfer functions using manifold learning. IEEE Signal Process. Lett. 24(2), 221–225 (2017)
Grijalva, F., Martini, L.C., Masiero, B., Goldenstein, S.: A recommender system for improving median plane sound localization performance based on a nonlinear representation of HRTFs. IEEE Access 6, 24829–24836 (2018)
Haneda, Y., Makino, S., Kaneda, Y., Kitawaki, N.: Common-acoustical-pole and zero modeling of head-related transfer functions. IEEE Trans. Speech Audio Process. 7(2), 188–196 (1999)
Hugeng, Gunawan, D., Wahab, W.: Effective preprocessing in modeling head-related impulse responses based on principal components analysis. Sig. Process. Int. J. 4(4), 201–212 (2010)
Iwaya, Y., Sato, W., Okamoto, T., Otani, M., Suzuki, Y.: Interpolation method of head-related transfer functions in the z-plane domain using a common-pole and zero model. In: 20th International Congress on Acoustics 2010, ICA 2010, Sydney, NSW, Australia, vol. 4, pp. 2936–2940 (2010)
Kistler, D.J., Wightman, F.L.: A model of head-related transfer-functions based on principal components-analysis and minimum-phase reconstruction. J. Acoust. Soc. Am. 91(3), 1637–1647 (1992)
Kulkarni, A., Colburn, H.S.: Infinite-impulse-response models of the head-related transfer function. J. Acoust. Soc. Am. 115, 1714–1728 (2004)
Liu, C.J., Hsieh, S.F.: Common-acoustic-poles/zeros approximation of head-related transfer functions. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3341–3344. IEEE Signal Processing Society (2001)
Mackenzie, J., Huopaniemi, J., Valimaki, V., Kale, I.: Low-order modeling of head-related transfer functions using balanced model truncation. IEEE Signal Process. Lett. 4(2), 39–41 (1997)
Majdak, P., Goupell, M.J., Laback, B.: 3-D localization of virtual sound sources: effects of visual environment, pointing method, and training. Atten. Percept. Psychophys. 72(2), 454–469 (2010)
Martens, W.L.: Principal components analysis and resynthesis of spectral cues to perceived direction. In: Proceedings of the International Computer Music Conference, Champaine-Urbana, IL (1987)
Meng, L., Wang, X., Chen, W., Ai, C., Hu, R.: Individualization of head related transfer functions based on radial basis function neural network. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). https://doi.org/10.1109/ICME.2018.8486494
Middlebrooks, J.C.: Individual differences in external-ear transfer functions reduced by scaling in frequency. J. Acoust. Soc. Am. 106(3), 1480–1492 (1999)
Ming, X., Binzhou, Y., Shuxia, G., Ying, G.: Head-related transfer function individualization based on locally linear embedding. In: Qiao, F., Patnaik, S., Wang, J. (eds.) ICMIR 2017. AISC, vol. 690, pp. 104–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65978-7_16
Turchenko, V., Chalmers, E., Luczak, A.: A deep convolutional auto-encoder with pooling – unpooling layers in caffe. Int. J. Comput. 18(1), 8–31 (2019). http://www.computingonline.net/computing/article/view/1270
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535. IEEE Computer Society (2010). https://doi.org/10.1109/CVPR.2010.5539957
Acknowledgment
This work is supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. 61701194, No. U1736206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, W., Hu, R., Wang, X., Li, D. (2020). HRTF Representation with Convolutional Auto-encoder. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-37731-1_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)