Abstract
Speech and audio codecs are implemented in a variety of multimedia applications, and multichannel sound is offered by first streaming or cloud-based services. Beside the objective of perceptual quality, coding-related research is focused on low bitrate and minimal latency. The IETF-standardized Opus codec provides a high perceptual quality, low latency and the capability of coding multiple channels in various audio bandwidths up to Fullband (20 kHz). In a previous perceptual study on Opus-processed 5.1 surround sound, uncompressed and degraded stimuli were rated on a five-point degradation category scale (DMOS) for six channels at total bitrates between 96 and 192 kbit/s. This study revealed that the perceived quality depends on the music characteristics. In the current study we analyze spectral and music-feature differences between those five music stimuli at three coding bitrates and uncompressed sound to identify objective causes for perceptual differences. The results show that samples with annoying audible degradations involve higher spectral differences within the LFE channel as well as highly uncorrelated LSPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dietz, M., Multrus, M., Eksler, V., Malenovsky, V., Norvell, E., Pobloth, H., Miao, L., Wang, Z., Laaksonen, L., Vasilache, A., Kamamoto, Y., Kikuiri, K., Ragot, S., Faure, J., Ehara, H., Rajendran, V., Atti, V., Sung, H., Oh, E., Yuan, H., Zhu, C.: Overview of the EVS codec architecture. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5698–5702 (2015)
Dobbriner, J., Jokisch, O., Maruschke, M.: Assessment of prosodic attributes in codec-compressed speech. In: Draxler, C., Kleber, F. (eds.) Proceedings of 12th Conference Phonetik und Phonologie im deutschsprachigen Raum (P&P), Munich, Germany, vol. 12, pp. 35–39. LMU Munich, October 2016
Dolby Laboratories Inc.: Dolby Atmos Demonstration Disc, August 2014
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM-2010, p. s.p., Firenze, Italy (2010)
Eyben, F., Schuller, B.: Music classification with the Munich openSMILE toolkit. In: Proceedings of Annual Meeting of the MIREX 2010 Community as Part of the 11th International Conference on Music Information Retrieval, p. s.p., Utrecht, Netherlands, August 2010
Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007)
Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013). https://tools.ietf.org/html/draft-ietf-codec-results-03
ITU-R: Multichannel stereophonic sound system with and without accompanying picture. REC BS.775-3, International Telecommunication Union (Radiocommunication Sector), August 2012. http://www.itu.int/rec/R-REC-BS.775-3-201208-I/en
ITU-T: Methods for objective and subjective assessment of quality- Methods for subjective determination of transmissen quality. REC P.800, International Telecommunication Union (Telecommunication Standardization Sector), August 1996. http://www.itu.int/rec/T-REC-P.800-199608-I/en
Jarschel, M., Schlosser, D., Scheuring, S., Hoßfeld, T.: An evaluation of QoE in cloud gaming based on subjective tests. In: Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 330–335, Seoul, Korea (2011)
Jokisch, O., Maruschke, M.: Audio and speech coding/transcoding in web real-time communication. In: International Symposium on Human Life Design (HLD 2016), p. s.p., Kanazawa, Japan (2016)
Lindberg Lyd AS. 2L - the Nordic sound: HiRes Test Bench (online available). http://www.2l.no/hires/index.html. Accessed 15 Jan 2017
Lutzky, M., Schuller, G., Gayer, M., Krämer, U., Wabnik, S.: A guideline to audio codec delay. In: AES 116th Convention, Berlin, Germany, pp. 8–11 (2004)
Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_69
Maruschke, M., Jokisch, O., Meszaros, M., Iaroshenko, V.: Review of the Opus Codec in a WebRTC scenario for audio and speech communication. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 348–355. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_43
Rämö, A., Toukomaa, H.: Voice quality characterization of IETF Opus codec. In: Proceedings of the INTERSPEECH-2011, pp. 2541–2544, Florence, Italy (2011)
Rämö, A., Toukomaa, H.: Subjective qualitiy evaluation of the 3Gpp. EVS codec. In: Proceedings of the 40th IEEE ICASSP, pp. 5157–5161, Brisbane, Australia (2015)
Siegert, I., Lotz, A.F., l. Duong, L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Studientexte zur Sprachkommunikation, vol. 81, pp. 229–236, Leipzig, Germany (2016)
Trojahn, F., Meszaros, M., Maruschke, M., Jokisch, O.: Surround sound processed by Opus codec: a perceptual quality assessment. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz. Studientexte zur Sprachkommunikation, vol. 86, pp. 300–307. TUDpress, Saarbrücken, Germany (2017)
Valin, J.M., Maxwell, G., Terriberry, T., Vos, K.: High-quality, low-delay music coding in the Opus codec. In: Proceedings of the 135th Audio Engineering Society Convention, p. s.p. Audio Engineering Society, New York, USA, October 2013
Valin, J., Vos, K., Terriberry, T.: Definition of the Opus audio codec. RFC 6716. http://tools.ietf.org/html/rfc6716
Zion Market Research Blog: Sound Bar Market: Rising events in corporate, film industry, sports and others increase the demand of sound bar systems, November 2016
Acknowledgments
This work was partly carried out within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG) (www.sfb-trr-62.de).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M. (2017). Acoustic Cues for the Perceptual Assessment of Surround Sound. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)