Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance

Schwabe, Markus; Hoffmann, Thorsten; Murgul, Sebastian; Heizmann, Michael

doi:10.1007/978-3-031-22061-6_24

Markus Schwabe⁹,
Thorsten Hoffmann⁹,
Sebastian Murgul¹⁰ &
…
Michael Heizmann⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13497))

Included in the following conference series:

International Conference on Smart Multimedia

668 Accesses

Abstract

Music signals can nowadays be recorded and further processed by lots of different devices in order to extract additional information like instruments and genre or use parts of those signals in various applications. Thereby, music recording quality has a big impact on all kinds of Music Information Retrieval (MIR) signal processing and their results. In this work, the recording quality of piano music is estimated by three separate neural network approaches for background noise, sound disturbances, and reverberation. The approaches for background noise and sound disturbances estimate the resulting Signal to Noise Ratio (SNR) of the music piece, the first for constant SNR and the latter for the time-dependent case. Reverberation is estimated by means of the two room parameters reverberation time and early decay time. Exemplarily, the SNR estimation results are validated in the field of piano music transcription, where the impact of the estimated recording quality on the automatic transcription results is analysed. According to those results, the piano music transcription performance can be predicted by means of the recording quality parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A perceptual measure for evaluating the resynthesis of automatic music transcriptions

Article Open access 13 April 2022

GenreNet: A Deep Based Approach for Music Genre Classification

Article 03 December 2024

Music2MIDI: Pop Music to MIDI Piano Cover Generation

Notes

1.
https://piano2notes.com.

References

GM 1 sound set. https://www.midi.org/specifications-old/item/gm-level-1-sound-set. Accessed 02 Sep 2021
NIST speech signal to noise ratio measurements. https://www.nist.gov/itl/iad/mig/nist-speech-signal-noise-ratio-measurements. Accessed 02 Sep 2021
Signal Processing Information Base (SPIB). https://spib.linse.ufsc.br/noise.html. Accessed 02 Sep 2021
Croghan, N.B.H., Arehart, K.H., Kates, J.M.: Quality and loudness judgments for music subjected to compression limiting. J. Acoust. Soc. America 132(2), 1177–1188 (2012). https://doi.org/10.1121/1.4730881
Article Google Scholar
Diether, S., Bruderer, L., Streich, A., Loeliger, H.A.: Efficient blind estimation of subband reverberation time from speech in non-diffuse environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 743–747. IEEE (2015). https://doi.org/10.1109/ICASSP.2015.7178068
Eaton, J., Gaubitch, N.D., Naylor, P.A.: Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 161–165. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6637629
Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2009). https://doi.org/10.1109/TASL.2009.2038819
Article Google Scholar
Hamawaki, S., Funasawa, S., Katto, J., Ishizaki, H., Hoashi, K., Takishima, Y.: Feature analysis and normalization approach for robust content-based music retrieval to encoded audio with different bit rates. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds.) MMM 2009. LNCS, vol. 5371, pp. 298–309. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92892-8_32
Chapter Google Scholar
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. arXiv preprint arXiv:1710.11153 (2017)
Kendrick, P., Cox, T.J., Zhang, Y., Chambers, J.A., Li, F.F.: Room acoustic parameter extraction from music signals. In: IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), vol. 5, pp. V801–V804 (2006). https://doi.org/10.1109/ICASSP.2006.1661397
Kim, C., Stern, R.: Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In: Ninth Annual Conference of the International Speech Communication Association. pp. 2598–2601 (2008)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kuttruff, H.: Room acoustics. CRC Press (2016). https://doi.org/10.1201/9781315372150
Mauch, M., Ewert, S.: The audio degradation toolbox and its application to robustness evaluation. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 83–88 (2013)
Google Scholar
McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference. vol. 8, pp. 18–25 (2015). https://doi.org/10.25080/MAJORA-7B98E3ED-003
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). vol. 2, pp. 749–752. IEEE (2001). https://doi.org/10.1109/ICASSP.2001.941023
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014). https://doi.org/10.1145/2647868.2655045
Schörkhuber, C., Klapuri, A.: Constant-Q transform toolbox for music processing. In: 7th Sound and Music Computing Conference, Barcelona, Spain, pp. 3–64 (2010)
Google Scholar
Schroeder, M.R.: New method of measuring reverberation time. J. Acoustical Soc. America 37(6), 1187–1188 (1965). https://doi.org/10.1121/1.1939454
Article Google Scholar
Serizel, R., Turpault, N., Shah, A., Salamon, J.: Sound event detection in synthetic domestic environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86–90. IEEE (2020). https://doi.org/10.1109/ICASSP40776.2020.9054478
Subramanian, V., Benetos, E., Sandler, M.: Robustness of adversarial attacks in sound event classification. In: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 239–243 (2019)
Google Scholar
Szöke, I., Skácel, M., Mošner, L., Paliesek, J., Černockỳ, J.H.: Building and evaluation of a real room impulse response dataset. IEEE J. Selected Top. in Signal Process. 13(4), 863–876 (2019). https://doi.org/10.1109/JSTSP.2019.2917582
Article Google Scholar
Uemura, A., Ishikura, K., Katto, J.: Effects of audio compression on chord recognition. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014. LNCS, vol. 8326, pp. 345–352. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04117-9_34
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology, Institute of Industrial Information Technology, Hertzstraße 16, 76187, Karlsruhe, Germany
Markus Schwabe, Thorsten Hoffmann & Michael Heizmann
Klangio GmbH, Alter Schlachthof 39, Karlsruhe, Germany
Sebastian Murgul

Authors

Markus Schwabe
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Murgul
View author publications
You can also search for this author in PubMed Google Scholar
Michael Heizmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Schwabe .

Editor information

Editors and Affiliations

University of Florence, Florence, Italy
Stefano Berretti
Dolby Labs, California, CA, USA
Guan-Ming Su

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schwabe, M., Hoffmann, T., Murgul, S., Heizmann, M. (2022). Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance. In: Berretti, S., Su, GM. (eds) Smart Multimedia. ICSM 2022. Lecture Notes in Computer Science, vol 13497. Springer, Cham. https://doi.org/10.1007/978-3-031-22061-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-22061-6_24
Published: 14 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22060-9
Online ISBN: 978-3-031-22061-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance