Abstract
A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility.
Similar content being viewed by others
Notes
Width \(\times \) height.
Conversely, the real dataset meets the width requirement in the majority of the cases.
References
Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1454–1458 (2013). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6628854
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008)
Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi:10.1145/1815330.1815354
Caponetti, L., Castiello, C., Górecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. 8(1), 118–126 (2008). doi:10.1016/j.asoc.2006.11.008
Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi:10.1109/ICIP.2008.4711927
Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009)
Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)
Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008). doi:10.1109/TPAMI.2007.70749
d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi:10.1109/ICPR.1994.576283
Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003)
Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Handbook of Document Image Processing and Recognition, pp. 749–774. Springer (2014). doi:10.1007/978-0-85729-859-1_24
Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999)
Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. Tutorial at ICCV. http://people.csail.mit.edu/torralba/shortCourseRLOC/ (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Maji, P., Roy, S.: Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation. Appl. Soft Comput. 30, 705–721 (2015). doi:10.1016/j.asoc.2015.01.049
Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, pp. 197–207. International Society for Optics and Photonics (2003). URL http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=755961
Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Pratt, W.K.: Digital Image Processing, 4th edn. Wiley, New York (2007)
Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols. Int. J. Doc. Anal. Recognit. (IJDAR) 13(1), 19–31 (2010). doi:10.1007/s10032-009-0100-1
dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi:10.1109/TPAMI.2009.34
Shafait, F., Keysers, D., Breuel, T.M.: Performance evaluation and benchmarking of six-page segmentation algorithms. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 941–954 (2008). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4407728
Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992)
Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi:10.1109/DAS.2012.16
Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi:10.1109/ICDAR.2013.154
Acknowledgments
We would like to thank the Social Sciences and Humanities Research Council of Canada and the University of Brescia, Italy, for funding this work.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pedersoli, F., Tzanetakis, G. Document segmentation and classification into musical scores and text. IJDAR 19, 289–304 (2016). https://doi.org/10.1007/s10032-016-0271-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0271-5