Skip to main content
Log in

Document segmentation and classification into musical scores and text

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Width \(\times \) height.

  2. Conversely, the real dataset meets the width requirement in the majority of the cases.

References

  1. Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1454–1458 (2013). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6628854

  2. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

  3. Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008)

  4. Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi:10.1145/1815330.1815354

  5. Caponetti, L., Castiello, C., Górecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. 8(1), 118–126 (2008). doi:10.1016/j.asoc.2006.11.008

    Article  Google Scholar 

  6. Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi:10.1109/ICIP.2008.4711927

  7. Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009)

  8. Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)

    Article  Google Scholar 

  9. Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008). doi:10.1109/TPAMI.2007.70749

    Article  Google Scholar 

  10. d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi:10.1109/ICPR.1994.576283

  11. Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003)

  12. Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Handbook of Document Image Processing and Recognition, pp. 749–774. Springer (2014). doi:10.1007/978-0-85729-859-1_24

  13. Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999)

  14. Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. Tutorial at ICCV. http://people.csail.mit.edu/torralba/shortCourseRLOC/ (2005)

  15. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  16. Maji, P., Roy, S.: Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation. Appl. Soft Comput. 30, 705–721 (2015). doi:10.1016/j.asoc.2015.01.049

    Article  Google Scholar 

  17. Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, pp. 197–207. International Society for Optics and Photonics (2003). URL http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=755961

  18. Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)

    Article  Google Scholar 

  19. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)

  20. Pratt, W.K.: Digital Image Processing, 4th edn. Wiley, New York (2007)

    Book  MATH  Google Scholar 

  21. Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols. Int. J. Doc. Anal. Recognit. (IJDAR) 13(1), 19–31 (2010). doi:10.1007/s10032-009-0100-1

  22. dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi:10.1109/TPAMI.2009.34

    Article  Google Scholar 

  23. Shafait, F., Keysers, D., Breuel, T.M.: Performance evaluation and benchmarking of six-page segmentation algorithms. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 941–954 (2008). URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4407728

  24. Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992)

  25. Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi:10.1109/DAS.2012.16

  26. Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi:10.1109/ICDAR.2013.154

Download references

Acknowledgments

We would like to thank the Social Sciences and Humanities Research Council of Canada and the University of Brescia, Italy, for funding this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrizio Pedersoli.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pedersoli, F., Tzanetakis, G. Document segmentation and classification into musical scores and text. IJDAR 19, 289–304 (2016). https://doi.org/10.1007/s10032-016-0271-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0271-5

Keywords

Navigation