Document segmentation and classification into musical scores and text

Pedersoli, Fabrizio; Tzanetakis, George

doi:10.1007/s10032-016-0271-5

Document segmentation and classification into musical scores and text

Original Paper
Published: 12 August 2016

Volume 19, pages 289–304, (2016)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Fabrizio Pedersoli¹ &
George Tzanetakis¹

738 Accesses
6 Citations
Explore all metrics

Abstract

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Notes

Width \(\times \) height.
Conversely, the real dataset meets the width requirement in the majority of the cases.

References

Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1454–1458 (2013). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6628854
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008)
Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi:10.1145/1815330.1815354
Caponetti, L., Castiello, C., Górecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. 8(1), 118–126 (2008). doi:10.1016/j.asoc.2006.11.008
Article Google Scholar
Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi:10.1109/ICIP.2008.4711927
Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009)
Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)
Article Google Scholar
Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008). doi:10.1109/TPAMI.2007.70749
Article Google Scholar
d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi:10.1109/ICPR.1994.576283
Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003)
Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Handbook of Document Image Processing and Recognition, pp. 749–774. Springer (2014). doi:10.1007/978-0-85729-859-1_24
Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999)
Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. Tutorial at ICCV. http://people.csail.mit.edu/torralba/shortCourseRLOC/ (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Maji, P., Roy, S.: Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation. Appl. Soft Comput. 30, 705–721 (2015). doi:10.1016/j.asoc.2015.01.049
Article Google Scholar
Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, pp. 197–207. International Society for Optics and Photonics (2003). URL http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=755961
Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Pratt, W.K.: Digital Image Processing, 4th edn. Wiley, New York (2007)
Book MATH Google Scholar
Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols. Int. J. Doc. Anal. Recognit. (IJDAR) 13(1), 19–31 (2010). doi:10.1007/s10032-009-0100-1
dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi:10.1109/TPAMI.2009.34
Article Google Scholar
Shafait, F., Keysers, D., Breuel, T.M.: Performance evaluation and benchmarking of six-page segmentation algorithms. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 941–954 (2008). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4407728
Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992)
Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi:10.1109/DAS.2012.16
Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi:10.1109/ICDAR.2013.154

Download references

Acknowledgments

We would like to thank the Social Sciences and Humanities Research Council of Canada and the University of Brescia, Italy, for funding this work.

Author information

Authors and Affiliations

Computer Science Department, University of Victoria, P.O. Box 3055, STN CSC, Victoria, BC, V8W 3P6, Canada
Fabrizio Pedersoli & George Tzanetakis

Authors

Fabrizio Pedersoli
View author publications
You can also search for this author in PubMed Google Scholar
George Tzanetakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Pedersoli.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pedersoli, F., Tzanetakis, G. Document segmentation and classification into musical scores and text. IJDAR 19, 289–304 (2016). https://doi.org/10.1007/s10032-016-0271-5

Download citation

Received: 18 September 2015
Revised: 31 May 2016
Accepted: 30 July 2016
Published: 12 August 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10032-016-0271-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document segmentation and classification into musical scores and text

Abstract

Access this article