Abstract
Music signals can nowadays be recorded and further processed by lots of different devices in order to extract additional information like instruments and genre or use parts of those signals in various applications. Thereby, music recording quality has a big impact on all kinds of Music Information Retrieval (MIR) signal processing and their results. In this work, the recording quality of piano music is estimated by three separate neural network approaches for background noise, sound disturbances, and reverberation. The approaches for background noise and sound disturbances estimate the resulting Signal to Noise Ratio (SNR) of the music piece, the first for constant SNR and the latter for the time-dependent case. Reverberation is estimated by means of the two room parameters reverberation time and early decay time. Exemplarily, the SNR estimation results are validated in the field of piano music transcription, where the impact of the estimated recording quality on the automatic transcription results is analysed. According to those results, the piano music transcription performance can be predicted by means of the recording quality parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
GM 1 sound set. https://www.midi.org/specifications-old/item/gm-level-1-sound-set. Accessed 02 Sep 2021
NIST speech signal to noise ratio measurements. https://www.nist.gov/itl/iad/mig/nist-speech-signal-noise-ratio-measurements. Accessed 02 Sep 2021
Signal Processing Information Base (SPIB). https://spib.linse.ufsc.br/noise.html. Accessed 02 Sep 2021
Croghan, N.B.H., Arehart, K.H., Kates, J.M.: Quality and loudness judgments for music subjected to compression limiting. J. Acoust. Soc. America 132(2), 1177–1188 (2012). https://doi.org/10.1121/1.4730881
Diether, S., Bruderer, L., Streich, A., Loeliger, H.A.: Efficient blind estimation of subband reverberation time from speech in non-diffuse environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 743–747. IEEE (2015). https://doi.org/10.1109/ICASSP.2015.7178068
Eaton, J., Gaubitch, N.D., Naylor, P.A.: Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 161–165. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6637629
Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2009). https://doi.org/10.1109/TASL.2009.2038819
Hamawaki, S., Funasawa, S., Katto, J., Ishizaki, H., Hoashi, K., Takishima, Y.: Feature analysis and normalization approach for robust content-based music retrieval to encoded audio with different bit rates. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds.) MMM 2009. LNCS, vol. 5371, pp. 298–309. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92892-8_32
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. arXiv preprint arXiv:1710.11153 (2017)
Kendrick, P., Cox, T.J., Zhang, Y., Chambers, J.A., Li, F.F.: Room acoustic parameter extraction from music signals. In: IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), vol. 5, pp. V801–V804 (2006). https://doi.org/10.1109/ICASSP.2006.1661397
Kim, C., Stern, R.: Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In: Ninth Annual Conference of the International Speech Communication Association. pp. 2598–2601 (2008)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kuttruff, H.: Room acoustics. CRC Press (2016). https://doi.org/10.1201/9781315372150
Mauch, M., Ewert, S.: The audio degradation toolbox and its application to robustness evaluation. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 83–88 (2013)
McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference. vol. 8, pp. 18–25 (2015). https://doi.org/10.25080/MAJORA-7B98E3ED-003
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). vol. 2, pp. 749–752. IEEE (2001). https://doi.org/10.1109/ICASSP.2001.941023
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014). https://doi.org/10.1145/2647868.2655045
Schörkhuber, C., Klapuri, A.: Constant-Q transform toolbox for music processing. In: 7th Sound and Music Computing Conference, Barcelona, Spain, pp. 3–64 (2010)
Schroeder, M.R.: New method of measuring reverberation time. J. Acoustical Soc. America 37(6), 1187–1188 (1965). https://doi.org/10.1121/1.1939454
Serizel, R., Turpault, N., Shah, A., Salamon, J.: Sound event detection in synthetic domestic environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86–90. IEEE (2020). https://doi.org/10.1109/ICASSP40776.2020.9054478
Subramanian, V., Benetos, E., Sandler, M.: Robustness of adversarial attacks in sound event classification. In: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 239–243 (2019)
Szöke, I., Skácel, M., Mošner, L., Paliesek, J., Černockỳ, J.H.: Building and evaluation of a real room impulse response dataset. IEEE J. Selected Top. in Signal Process. 13(4), 863–876 (2019). https://doi.org/10.1109/JSTSP.2019.2917582
Uemura, A., Ishikura, K., Katto, J.: Effects of audio compression on chord recognition. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014. LNCS, vol. 8326, pp. 345–352. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04117-9_34
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schwabe, M., Hoffmann, T., Murgul, S., Heizmann, M. (2022). Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance. In: Berretti, S., Su, GM. (eds) Smart Multimedia. ICSM 2022. Lecture Notes in Computer Science, vol 13497. Springer, Cham. https://doi.org/10.1007/978-3-031-22061-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-22061-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22060-9
Online ISBN: 978-3-031-22061-6
eBook Packages: Computer ScienceComputer Science (R0)