Skip to main content

Spatial Audio Scene Characterization (SASC)

Automatic Classification of Five-Channel Surround Sound Recordings According to the Foreground and Background Content

  • Conference paper
  • First Online:
Multimedia and Network Information Systems (MISSI 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

  • 715 Accesses

Abstract

Spatial audio becomes increasingly popular in domestic and mobile multimedia applications. Evaluating quality of experience (QoE) of such applications requires the development of algorithms capable of identification and quantification of perceptual characteristics of spatial audio scenes. This paper introduces a method for the automatic categorization of surround sound recordings using a criterion based on the distribution of foreground and background audio content around a listener. The principles of the method were demonstrated using a study in which a corpus of 110 five-channel surround sound recordings was computationally classified according to the two basic spatial audio scene categories. In order to develop the proposed method a novel metric, representing spatial audio characteristics, was identified. Moreover, five machine learning algorithms, including neural networks, random forests and support vector machines, were employed and their performance compared. According to the obtained results, the proposed method was capable of categorization of surround sound recordings reaching accuracy of 99%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ITU-R Rec. BS.775: multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012)

    Google Scholar 

  2. Blauert, J.: The Technology of Binaural Listening. Springer, New York (2013). https://doi.org/10.1007/978-3-642-37762-4

  3. Sugimoto, T., Nakayama, Y., Komori, T.: 22.2 ch audio encoding/decoding hardware system based on MPEG-4 AAC. IEEE Trans. Broadcast. 63(2), 426–432 (2017). https://doi.org/10.1109/tbc.2017.2687699

    Article  Google Scholar 

  4. ITU-R Rec. BS.1116: methods for the subjective assessment of small impairments in audio systems. International Telecommunication Union, Geneva, Switzerland (2015)

    Google Scholar 

  5. Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 1): impact of commonly encountered processes. J. Audio Eng. Soc. 62(12), 831–846 (2014). https://doi.org/10.17743/jaes.2014.0048

    Article  Google Scholar 

  6. Walton, T.: The overall listening experience of binaural audio. In: Proceedings of the 4th International Conference on Spatial Audio (ICSA), Graz, Austria (2017)

    Google Scholar 

  7. Zacharov, N., Pedersen, T., Pike, C.: A common lexicon for spatial sound quality assessment–latest developments. In: Proceedings of the 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal (2016). https://doi.org/10.1109/qomex.2016.7498967

  8. Berg, J., Rumsey, F.: Spatial attribute identification and scaling by repertory grid technique and other methods. In: Proceedings of the 16th International AES Conference, On Spatial Sound Reproduction, Rovaniemi (1999)

    Google Scholar 

  9. Le Bagousse, S., Paquier, M., Colomes, C.: Categorization of sound attributes for audio quality assessment–a lexical study. J. Audio Eng. Soc. 62(11), 736–747 (2014). https://doi.org/10.17743/jaes.2014.0043

    Article  Google Scholar 

  10. Lindau, A., Erbes, V., Lepa, S., Maempel, H.-J., Brinkman, F., Weinzierl, S.: A spatial audio quality inventory (SAQI). Acta Acust. U. Acust. 100, 984–994 (2014). https://doi.org/10.3813/aaa.918778

    Article  Google Scholar 

  11. Rumsey, F.: Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)

    Google Scholar 

  12. Zieliński, S., Rumsey, F., Kassier, R., Bech, S.: Development and initial validation of a multichannel audio quality expert system. J. Audio Eng. Soc. 53(1/2), 4–21 (2005)

    Google Scholar 

  13. Zieliński, S.K.: Feature extraction of surround sound recordings for acoustic scene classification. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing, ICAISC 2018. LNCS, vol. 10842. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91262-2_43

  14. Härmä, A., Park, M., Kohlrausch, A.: Data-driven modeling of the spatial sound experience. In: Proceedings of the 136th AES Convention, Berlin (2014)

    Google Scholar 

  15. Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014). https://doi.org/10.17743/jaes.2014.0047

    Article  Google Scholar 

  16. George, S., Zieliński, S., Rumsey, F., Jackson, P., Conetta, R., Dewhirst, M., Meares, D., Bech, S., George, S.: Development and validation of an unintrusive model for predicting the sensation of envelopment arising from surround sound recordings. J. Audio Eng. Soc. 58(12), 1013–1031 (2010)

    Google Scholar 

  17. Francombe, J., Brookes, T., Mason, R.: Determination and validation of mix parameters for modifying envelopment in object-based audio. J. Audio Eng. Soc. 66(3), 127–145 (2018). https://doi.org/10.17743/jaes.2018.0011

    Article  Google Scholar 

  18. Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015). https://doi.org/10.1109/tmm.2015.2428998

    Article  Google Scholar 

  19. Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017). https://doi.org/10.1109/taslp.2017.2690559

    Article  Google Scholar 

  20. Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011). https://doi.org/10.1121/1.3642604

    Article  Google Scholar 

  21. Glasberg, B.R., Moore, B.C.J.: A model of loudness applicable to time-varying sounds. J. Audio Eng. Soc. 50(5), 331–342 (2002)

    Google Scholar 

  22. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R. Springer, London (2017). https://doi.org/10.1007/978-1-4614-7138-7

  23. Francombe, J., Brookes, T., Mason, R.: Evaluation of spatial audio reproduction methods (part 1): elicitation of perceptual differences. J. Audio Eng. Soc. 65(3), 198–211 (2017). https://doi.org/10.17743/jaes.2016.0070

    Article  Google Scholar 

  24. Francombe, J., Brookes, T., Mason, R., Woodcock, J.: Evaluation of spatial audio reproduction methods (part 2): analysis of listener preference. J. Audio Eng. Soc. 65(3), 212–225 (2017). https://doi.org/10.17743/jaes.2016.0071

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by a grant S/WI/3/2013 from Bialystok University of Technology and funded from the resources for research by Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sławomir K. Zieliński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zieliński, S.K. (2019). Spatial Audio Scene Characterization (SASC). In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_46

Download citation

Publish with us

Policies and ethics