Skip to main content

Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance

  • Conference paper
  • First Online:
Smart Multimedia (ICSM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13497))

Included in the following conference series:

  • 503 Accesses

Abstract

Music signals can nowadays be recorded and further processed by lots of different devices in order to extract additional information like instruments and genre or use parts of those signals in various applications. Thereby, music recording quality has a big impact on all kinds of Music Information Retrieval (MIR) signal processing and their results. In this work, the recording quality of piano music is estimated by three separate neural network approaches for background noise, sound disturbances, and reverberation. The approaches for background noise and sound disturbances estimate the resulting Signal to Noise Ratio (SNR) of the music piece, the first for constant SNR and the latter for the time-dependent case. Reverberation is estimated by means of the two room parameters reverberation time and early decay time. Exemplarily, the SNR estimation results are validated in the field of piano music transcription, where the impact of the estimated recording quality on the automatic transcription results is analysed. According to those results, the piano music transcription performance can be predicted by means of the recording quality parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://piano2notes.com.

References

  1. GM 1 sound set. https://www.midi.org/specifications-old/item/gm-level-1-sound-set. Accessed 02 Sep 2021

  2. NIST speech signal to noise ratio measurements. https://www.nist.gov/itl/iad/mig/nist-speech-signal-noise-ratio-measurements. Accessed 02 Sep 2021

  3. Signal Processing Information Base (SPIB). https://spib.linse.ufsc.br/noise.html. Accessed 02 Sep 2021

  4. Croghan, N.B.H., Arehart, K.H., Kates, J.M.: Quality and loudness judgments for music subjected to compression limiting. J. Acoust. Soc. America 132(2), 1177–1188 (2012). https://doi.org/10.1121/1.4730881

    Article  Google Scholar 

  5. Diether, S., Bruderer, L., Streich, A., Loeliger, H.A.: Efficient blind estimation of subband reverberation time from speech in non-diffuse environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 743–747. IEEE (2015). https://doi.org/10.1109/ICASSP.2015.7178068

  6. Eaton, J., Gaubitch, N.D., Naylor, P.A.: Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 161–165. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6637629

  7. Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2009). https://doi.org/10.1109/TASL.2009.2038819

    Article  Google Scholar 

  8. Hamawaki, S., Funasawa, S., Katto, J., Ishizaki, H., Hoashi, K., Takishima, Y.: Feature analysis and normalization approach for robust content-based music retrieval to encoded audio with different bit rates. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds.) MMM 2009. LNCS, vol. 5371, pp. 298–309. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92892-8_32

    Chapter  Google Scholar 

  9. Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. arXiv preprint arXiv:1710.11153 (2017)

  10. Kendrick, P., Cox, T.J., Zhang, Y., Chambers, J.A., Li, F.F.: Room acoustic parameter extraction from music signals. In: IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), vol. 5, pp. V801–V804 (2006). https://doi.org/10.1109/ICASSP.2006.1661397

  11. Kim, C., Stern, R.: Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In: Ninth Annual Conference of the International Speech Communication Association. pp. 2598–2601 (2008)

    Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  13. Kuttruff, H.: Room acoustics. CRC Press (2016). https://doi.org/10.1201/9781315372150

  14. Mauch, M., Ewert, S.: The audio degradation toolbox and its application to robustness evaluation. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 83–88 (2013)

    Google Scholar 

  15. McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference. vol. 8, pp. 18–25 (2015). https://doi.org/10.25080/MAJORA-7B98E3ED-003

  16. Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). vol. 2, pp. 749–752. IEEE (2001). https://doi.org/10.1109/ICASSP.2001.941023

  17. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014). https://doi.org/10.1145/2647868.2655045

  18. Schörkhuber, C., Klapuri, A.: Constant-Q transform toolbox for music processing. In: 7th Sound and Music Computing Conference, Barcelona, Spain, pp. 3–64 (2010)

    Google Scholar 

  19. Schroeder, M.R.: New method of measuring reverberation time. J. Acoustical Soc. America 37(6), 1187–1188 (1965). https://doi.org/10.1121/1.1939454

    Article  Google Scholar 

  20. Serizel, R., Turpault, N., Shah, A., Salamon, J.: Sound event detection in synthetic domestic environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86–90. IEEE (2020). https://doi.org/10.1109/ICASSP40776.2020.9054478

  21. Subramanian, V., Benetos, E., Sandler, M.: Robustness of adversarial attacks in sound event classification. In: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 239–243 (2019)

    Google Scholar 

  22. Szöke, I., Skácel, M., Mošner, L., Paliesek, J., Černockỳ, J.H.: Building and evaluation of a real room impulse response dataset. IEEE J. Selected Top. in Signal Process. 13(4), 863–876 (2019). https://doi.org/10.1109/JSTSP.2019.2917582

    Article  Google Scholar 

  23. Uemura, A., Ishikura, K., Katto, J.: Effects of audio compression on chord recognition. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014. LNCS, vol. 8326, pp. 345–352. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04117-9_34

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Schwabe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schwabe, M., Hoffmann, T., Murgul, S., Heizmann, M. (2022). Estimation of Music Recording Quality to Predict Automatic Music Transcription Performance. In: Berretti, S., Su, GM. (eds) Smart Multimedia. ICSM 2022. Lecture Notes in Computer Science, vol 13497. Springer, Cham. https://doi.org/10.1007/978-3-031-22061-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22061-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22060-9

  • Online ISBN: 978-3-031-22061-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics