Abstract
Spatial audio becomes increasingly popular in domestic and mobile multimedia applications. Evaluating quality of experience (QoE) of such applications requires the development of algorithms capable of identification and quantification of perceptual characteristics of spatial audio scenes. This paper introduces a method for the automatic categorization of surround sound recordings using a criterion based on the distribution of foreground and background audio content around a listener. The principles of the method were demonstrated using a study in which a corpus of 110 five-channel surround sound recordings was computationally classified according to the two basic spatial audio scene categories. In order to develop the proposed method a novel metric, representing spatial audio characteristics, was identified. Moreover, five machine learning algorithms, including neural networks, random forests and support vector machines, were employed and their performance compared. According to the obtained results, the proposed method was capable of categorization of surround sound recordings reaching accuracy of 99%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ITU-R Rec. BS.775: multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012)
Blauert, J.: The Technology of Binaural Listening. Springer, New York (2013). https://doi.org/10.1007/978-3-642-37762-4
Sugimoto, T., Nakayama, Y., Komori, T.: 22.2 ch audio encoding/decoding hardware system based on MPEG-4 AAC. IEEE Trans. Broadcast. 63(2), 426–432 (2017). https://doi.org/10.1109/tbc.2017.2687699
ITU-R Rec. BS.1116: methods for the subjective assessment of small impairments in audio systems. International Telecommunication Union, Geneva, Switzerland (2015)
Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 1): impact of commonly encountered processes. J. Audio Eng. Soc. 62(12), 831–846 (2014). https://doi.org/10.17743/jaes.2014.0048
Walton, T.: The overall listening experience of binaural audio. In: Proceedings of the 4th International Conference on Spatial Audio (ICSA), Graz, Austria (2017)
Zacharov, N., Pedersen, T., Pike, C.: A common lexicon for spatial sound quality assessment–latest developments. In: Proceedings of the 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal (2016). https://doi.org/10.1109/qomex.2016.7498967
Berg, J., Rumsey, F.: Spatial attribute identification and scaling by repertory grid technique and other methods. In: Proceedings of the 16th International AES Conference, On Spatial Sound Reproduction, Rovaniemi (1999)
Le Bagousse, S., Paquier, M., Colomes, C.: Categorization of sound attributes for audio quality assessment–a lexical study. J. Audio Eng. Soc. 62(11), 736–747 (2014). https://doi.org/10.17743/jaes.2014.0043
Lindau, A., Erbes, V., Lepa, S., Maempel, H.-J., Brinkman, F., Weinzierl, S.: A spatial audio quality inventory (SAQI). Acta Acust. U. Acust. 100, 984–994 (2014). https://doi.org/10.3813/aaa.918778
Rumsey, F.: Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)
Zieliński, S., Rumsey, F., Kassier, R., Bech, S.: Development and initial validation of a multichannel audio quality expert system. J. Audio Eng. Soc. 53(1/2), 4–21 (2005)
Zieliński, S.K.: Feature extraction of surround sound recordings for acoustic scene classification. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing, ICAISC 2018. LNCS, vol. 10842. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91262-2_43
Härmä, A., Park, M., Kohlrausch, A.: Data-driven modeling of the spatial sound experience. In: Proceedings of the 136th AES Convention, Berlin (2014)
Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014). https://doi.org/10.17743/jaes.2014.0047
George, S., Zieliński, S., Rumsey, F., Jackson, P., Conetta, R., Dewhirst, M., Meares, D., Bech, S., George, S.: Development and validation of an unintrusive model for predicting the sensation of envelopment arising from surround sound recordings. J. Audio Eng. Soc. 58(12), 1013–1031 (2010)
Francombe, J., Brookes, T., Mason, R.: Determination and validation of mix parameters for modifying envelopment in object-based audio. J. Audio Eng. Soc. 66(3), 127–145 (2018). https://doi.org/10.17743/jaes.2018.0011
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015). https://doi.org/10.1109/tmm.2015.2428998
Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017). https://doi.org/10.1109/taslp.2017.2690559
Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011). https://doi.org/10.1121/1.3642604
Glasberg, B.R., Moore, B.C.J.: A model of loudness applicable to time-varying sounds. J. Audio Eng. Soc. 50(5), 331–342 (2002)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R. Springer, London (2017). https://doi.org/10.1007/978-1-4614-7138-7
Francombe, J., Brookes, T., Mason, R.: Evaluation of spatial audio reproduction methods (part 1): elicitation of perceptual differences. J. Audio Eng. Soc. 65(3), 198–211 (2017). https://doi.org/10.17743/jaes.2016.0070
Francombe, J., Brookes, T., Mason, R., Woodcock, J.: Evaluation of spatial audio reproduction methods (part 2): analysis of listener preference. J. Audio Eng. Soc. 65(3), 212–225 (2017). https://doi.org/10.17743/jaes.2016.0071
Acknowledgments
This work was supported by a grant S/WI/3/2013 from Bialystok University of Technology and funded from the resources for research by Ministry of Science and Higher Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zieliński, S.K. (2019). Spatial Audio Scene Characterization (SASC). In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)