Spatial Audio Scene Characterization (SASC)

Zieliński, Sławomir K.

doi:10.1007/978-3-319-98678-4_46

Sławomir K. Zieliński¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

International Conference on Multimedia and Network Information System

715 Accesses

Abstract

Spatial audio becomes increasingly popular in domestic and mobile multimedia applications. Evaluating quality of experience (QoE) of such applications requires the development of algorithms capable of identification and quantification of perceptual characteristics of spatial audio scenes. This paper introduces a method for the automatic categorization of surround sound recordings using a criterion based on the distribution of foreground and background audio content around a listener. The principles of the method were demonstrated using a study in which a corpus of 110 five-channel surround sound recordings was computationally classified according to the two basic spatial audio scene categories. In order to develop the proposed method a novel metric, representing spatial audio characteristics, was identified. Moreover, five machine learning algorithms, including neural networks, random forests and support vector machines, were employed and their performance compared. According to the obtained results, the proposed method was capable of categorization of surround sound recordings reaching accuracy of 99%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ITU-R Rec. BS.775: multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012)
Google Scholar
Blauert, J.: The Technology of Binaural Listening. Springer, New York (2013). https://doi.org/10.1007/978-3-642-37762-4
Sugimoto, T., Nakayama, Y., Komori, T.: 22.2 ch audio encoding/decoding hardware system based on MPEG-4 AAC. IEEE Trans. Broadcast. 63(2), 426–432 (2017). https://doi.org/10.1109/tbc.2017.2687699
Article Google Scholar
ITU-R Rec. BS.1116: methods for the subjective assessment of small impairments in audio systems. International Telecommunication Union, Geneva, Switzerland (2015)
Google Scholar
Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 1): impact of commonly encountered processes. J. Audio Eng. Soc. 62(12), 831–846 (2014). https://doi.org/10.17743/jaes.2014.0048
Article Google Scholar
Walton, T.: The overall listening experience of binaural audio. In: Proceedings of the 4th International Conference on Spatial Audio (ICSA), Graz, Austria (2017)
Google Scholar
Zacharov, N., Pedersen, T., Pike, C.: A common lexicon for spatial sound quality assessment–latest developments. In: Proceedings of the 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal (2016). https://doi.org/10.1109/qomex.2016.7498967
Berg, J., Rumsey, F.: Spatial attribute identification and scaling by repertory grid technique and other methods. In: Proceedings of the 16th International AES Conference, On Spatial Sound Reproduction, Rovaniemi (1999)
Google Scholar
Le Bagousse, S., Paquier, M., Colomes, C.: Categorization of sound attributes for audio quality assessment–a lexical study. J. Audio Eng. Soc. 62(11), 736–747 (2014). https://doi.org/10.17743/jaes.2014.0043
Article Google Scholar
Lindau, A., Erbes, V., Lepa, S., Maempel, H.-J., Brinkman, F., Weinzierl, S.: A spatial audio quality inventory (SAQI). Acta Acust. U. Acust. 100, 984–994 (2014). https://doi.org/10.3813/aaa.918778
Article Google Scholar
Rumsey, F.: Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)
Google Scholar
Zieliński, S., Rumsey, F., Kassier, R., Bech, S.: Development and initial validation of a multichannel audio quality expert system. J. Audio Eng. Soc. 53(1/2), 4–21 (2005)
Google Scholar
Zieliński, S.K.: Feature extraction of surround sound recordings for acoustic scene classification. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing, ICAISC 2018. LNCS, vol. 10842. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91262-2_43
Härmä, A., Park, M., Kohlrausch, A.: Data-driven modeling of the spatial sound experience. In: Proceedings of the 136th AES Convention, Berlin (2014)
Google Scholar
Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014). https://doi.org/10.17743/jaes.2014.0047
Article Google Scholar
George, S., Zieliński, S., Rumsey, F., Jackson, P., Conetta, R., Dewhirst, M., Meares, D., Bech, S., George, S.: Development and validation of an unintrusive model for predicting the sensation of envelopment arising from surround sound recordings. J. Audio Eng. Soc. 58(12), 1013–1031 (2010)
Google Scholar
Francombe, J., Brookes, T., Mason, R.: Determination and validation of mix parameters for modifying envelopment in object-based audio. J. Audio Eng. Soc. 66(3), 127–145 (2018). https://doi.org/10.17743/jaes.2018.0011
Article Google Scholar
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015). https://doi.org/10.1109/tmm.2015.2428998
Article Google Scholar
Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017). https://doi.org/10.1109/taslp.2017.2690559
Article Google Scholar
Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011). https://doi.org/10.1121/1.3642604
Article Google Scholar
Glasberg, B.R., Moore, B.C.J.: A model of loudness applicable to time-varying sounds. J. Audio Eng. Soc. 50(5), 331–342 (2002)
Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R. Springer, London (2017). https://doi.org/10.1007/978-1-4614-7138-7
Francombe, J., Brookes, T., Mason, R.: Evaluation of spatial audio reproduction methods (part 1): elicitation of perceptual differences. J. Audio Eng. Soc. 65(3), 198–211 (2017). https://doi.org/10.17743/jaes.2016.0070
Article Google Scholar
Francombe, J., Brookes, T., Mason, R., Woodcock, J.: Evaluation of spatial audio reproduction methods (part 2): analysis of listener preference. J. Audio Eng. Soc. 65(3), 212–225 (2017). https://doi.org/10.17743/jaes.2016.0071
Article Google Scholar

Download references

Acknowledgments

This work was supported by a grant S/WI/3/2013 from Bialystok University of Technology and funded from the resources for research by Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Faculty of Computer Science, Białystok University of Technology, Białystok, Poland
Sławomir K. Zieliński

Authors

Sławomir K. Zieliński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sławomir K. Zieliński .

Editor information

Editors and Affiliations

Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Kazimierz Choroś
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Marek Kopel
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Elżbieta Kukla
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Andrzej Siemiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zieliński, S.K. (2019). Spatial Audio Scene Characterization (SASC). In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-98678-4_46
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics