Acoustic Cues for the Perceptual Assessment of Surround Sound

Siegert, Ingo; Jokisch, Oliver; Lotz, Alicia Flores; Trojahn, Franziska; Meszaros, Martin; Maruschke, Michael

doi:10.1007/978-3-319-66429-3_6

Ingo Siegert¹⁶,
Oliver Jokisch¹⁷,
Alicia Flores Lotz¹⁶,
Franziska Trojahn¹⁷,
Martin Meszaros¹⁷ &
…
Michael Maruschke¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2224 Accesses
2 Citations

Abstract

Speech and audio codecs are implemented in a variety of multimedia applications, and multichannel sound is offered by first streaming or cloud-based services. Beside the objective of perceptual quality, coding-related research is focused on low bitrate and minimal latency. The IETF-standardized Opus codec provides a high perceptual quality, low latency and the capability of coding multiple channels in various audio bandwidths up to Fullband (20 kHz). In a previous perceptual study on Opus-processed 5.1 surround sound, uncompressed and degraded stimuli were rated on a five-point degradation category scale (DMOS) for six channels at total bitrates between 96 and 192 kbit/s. This study revealed that the perceived quality depends on the music characteristics. In the current study we analyze spectral and music-feature differences between those five music stimuli at three coding bitrates and uncompressed sound to identify objective causes for perceptual differences. The results show that samples with annoying audible degradations involve higher spectral differences within the LFE channel as well as highly uncorrelated LSPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dietz, M., Multrus, M., Eksler, V., Malenovsky, V., Norvell, E., Pobloth, H., Miao, L., Wang, Z., Laaksonen, L., Vasilache, A., Kamamoto, Y., Kikuiri, K., Ragot, S., Faure, J., Ehara, H., Rajendran, V., Atti, V., Sung, H., Oh, E., Yuan, H., Zhu, C.: Overview of the EVS codec architecture. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5698–5702 (2015)
Google Scholar
Dobbriner, J., Jokisch, O., Maruschke, M.: Assessment of prosodic attributes in codec-compressed speech. In: Draxler, C., Kleber, F. (eds.) Proceedings of 12th Conference Phonetik und Phonologie im deutschsprachigen Raum (P&P), Munich, Germany, vol. 12, pp. 35–39. LMU Munich, October 2016
Google Scholar
Dolby Laboratories Inc.: Dolby Atmos Demonstration Disc, August 2014
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM-2010, p. s.p., Firenze, Italy (2010)
Google Scholar
Eyben, F., Schuller, B.: Music classification with the Munich openSMILE toolkit. In: Proceedings of Annual Meeting of the MIREX 2010 Community as Part of the 11th International Conference on Music Information Retrieval, p. s.p., Utrecht, Netherlands, August 2010
Google Scholar
Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007)
Book Google Scholar
Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013). https://tools.ietf.org/html/draft-ietf-codec-results-03
ITU-R: Multichannel stereophonic sound system with and without accompanying picture. REC BS.775-3, International Telecommunication Union (Radiocommunication Sector), August 2012. http://www.itu.int/rec/R-REC-BS.775-3-201208-I/en
ITU-T: Methods for objective and subjective assessment of quality- Methods for subjective determination of transmissen quality. REC P.800, International Telecommunication Union (Telecommunication Standardization Sector), August 1996. http://www.itu.int/rec/T-REC-P.800-199608-I/en
Jarschel, M., Schlosser, D., Scheuring, S., Hoßfeld, T.: An evaluation of QoE in cloud gaming based on subjective tests. In: Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 330–335, Seoul, Korea (2011)
Google Scholar
Jokisch, O., Maruschke, M.: Audio and speech coding/transcoding in web real-time communication. In: International Symposium on Human Life Design (HLD 2016), p. s.p., Kanazawa, Japan (2016)
Google Scholar
Lindberg Lyd AS. 2L - the Nordic sound: HiRes Test Bench (online available). http://www.2l.no/hires/index.html. Accessed 15 Jan 2017
Lutzky, M., Schuller, G., Gayer, M., Krämer, U., Wabnik, S.: A guideline to audio codec delay. In: AES 116th Convention, Berlin, Germany, pp. 8–11 (2004)
Google Scholar
Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_69
Chapter Google Scholar
Maruschke, M., Jokisch, O., Meszaros, M., Iaroshenko, V.: Review of the Opus Codec in a WebRTC scenario for audio and speech communication. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 348–355. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_43
Chapter Google Scholar
Rämö, A., Toukomaa, H.: Voice quality characterization of IETF Opus codec. In: Proceedings of the INTERSPEECH-2011, pp. 2541–2544, Florence, Italy (2011)
Google Scholar
Rämö, A., Toukomaa, H.: Subjective qualitiy evaluation of the 3Gpp. EVS codec. In: Proceedings of the 40th IEEE ICASSP, pp. 5157–5161, Brisbane, Australia (2015)
Google Scholar
Siegert, I., Lotz, A.F., l. Duong, L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Studientexte zur Sprachkommunikation, vol. 81, pp. 229–236, Leipzig, Germany (2016)
Google Scholar
Trojahn, F., Meszaros, M., Maruschke, M., Jokisch, O.: Surround sound processed by Opus codec: a perceptual quality assessment. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz. Studientexte zur Sprachkommunikation, vol. 86, pp. 300–307. TUDpress, Saarbrücken, Germany (2017)
Google Scholar
Valin, J.M., Maxwell, G., Terriberry, T., Vos, K.: High-quality, low-delay music coding in the Opus codec. In: Proceedings of the 135th Audio Engineering Society Convention, p. s.p. Audio Engineering Society, New York, USA, October 2013
Google Scholar
Valin, J., Vos, K., Terriberry, T.: Definition of the Opus audio codec. RFC 6716. http://tools.ietf.org/html/rfc6716
Zion Market Research Blog: Sound Bar Market: Rising events in corporate, film industry, sports and others increase the demand of sound bar systems, November 2016
Google Scholar

Download references

Acknowledgments

This work was partly carried out within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG) (www.sfb-trr-62.de).

Author information

Authors and Affiliations

Cognitive Systems Group, Institute of Information and Communication Engineering, Otto von Guericke University, 39016, Magdeburg, Germany
Ingo Siegert & Alicia Flores Lotz
Institute of Communications Engineering, Leipzig University of Telecommunications, 04277, Leipzig, Germany
Oliver Jokisch, Franziska Trojahn, Martin Meszaros & Michael Maruschke

Authors

Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Jokisch
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Flores Lotz
View author publications
You can also search for this author in PubMed Google Scholar
Franziska Trojahn
View author publications
You can also search for this author in PubMed Google Scholar
Martin Meszaros
View author publications
You can also search for this author in PubMed Google Scholar
Michael Maruschke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ingo Siegert or Oliver Jokisch .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M. (2017). Acoustic Cues for the Perceptual Assessment of Surround Sound. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_6
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics