skip to main content
10.1145/2505377.2505393acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmocrConference Proceedingsconference-collections
research-article

Text graphic separation in Indian newspapers

Published:24 August 2013Publication History

ABSTRACT

Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.

References

  1. S. Aggarwal, S. Kumar, R. Garg, and S. Chaudhury. Content directed enhancement of degraded document images. In Proceeding of the workshop on Document Analysis and Recognition, pages 55--61, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. C. Fan, C. H. Liu, and Y. K. Wang. Segmentation and classification of mixed text/graphics/image documents. Pattern Recognition Letters, 15(12):1201--1209, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Cao and C. L. Tan. Text/graphics separation in maps. In Fourth International Workshop on Graphics Recognition Algorithms and Applications, pages 167--177, London, UK, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Cattoni, S. M. T. Coianiz, and C. M. Modena. Geometric layout analysis techniques for document image understanding: a review. Technical report, IRST, 1998.Google ScholarGoogle Scholar
  5. S. Chowdhury, S. Mandal, A. Das, and B. Chanda. Segmentation of text and graphics from document images. In Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, pages 619--623, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. A. Fletcher and R. Kasturi. A robust algorithm for text string separation from mixed text/graphics images. IEEE Transaction Pattern Analysis Machine Intelligence, 10(6):910--918, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Gatos, S. L. Mantzaris, and A. Antonacopoulos. First international newspaper segmentation contest. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 1190--1194, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Gatos, S. L. Mantzaris, K. V. Chandrinos, A. Tsigris, and S. J. Perantonis. Integrated algorithms for newspaper page decomposition and article tracking. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Hadjar, O. Hitz, and R. Ingold. Newspaper page decomposition using a split and merge approach. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 1186--1189, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Hadjar and R. Ingold. Arabic newspaper page segmentation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2, ICDAR '03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Harit, R. Garg, and S. Chaudhury. Syntactic and semantic labeling of hierarchically organized document image components of indian scripts. In Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on, pages 314--317, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. K. Jain and S. Bhattacharjee. Texture segmentation using gabor filters for automatic document processing. Machine Vision and Application, 5:169--184, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Journet, V. Eglin, J. Ramel, and R. Mullot. Text/graphic labelling of ancient printed documents. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 1010--1014, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Khedekar, V. Ramanaprasad, S. Setlur, and V. Govindaraju. Text - image separation in devanagari documents. In Proceedings of the Seventh ICDAR, pages 1265--1269, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kumar, R. Gupta, N. Khanna, S. Chaudhury, and S. D. Joshi. Text extraction and document image segmentation using matched wavelets and mrf model. IEEE Transactions of Image Processing, 16:2117--2128, August 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Liu. A new component based algorithm for newspaper layout analysis. In Proceedings of the Sixth ICDAR, ICDAR '01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Liu, Y. Y. Tang, and C. Y. Suen. Chinese document layout analysis based on adaptive split-and-merge and qualitative spatial reasoning. Pattern Recognition, 30(7):1265--1278, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  18. Z. M.-H. H. X.-Z. Liu Dong-Rong, Wang Ke-Jian. Chinese newspaper layout analysis with antecedent compartmental lines. In Proceedings of the Second International Conference on Machine Learning and Cybernetics, pages 2771--2774, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Mao, A. Rosenfeld, and T. Kanungo. Document structure analysis algorithms: a literature survey. Proc. SPIE Electronic Imaging, page 197âĂKŞ207, 2003.Google ScholarGoogle Scholar
  20. P. E. Mitchell and H. Yan. Newspaper layout analysis incorporating connected component separation. Image Vision Comput., 22(4):307--317, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Nagy. Twenty years of document image analysis in pami. IEEE Trans. PAMI, 22(1):38--62, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. P. Rege and C. A. Chandrakar. Text-image separation in document images using boundary/perimeter detection. ACEEE International Journal on Signal and Image Processing, 03(1):10--14, 2012.Google ScholarGoogle Scholar
  23. P. P. Roy, J. Llados, and U. Pal. Text/graphics separation in color maps. In Proceedings of the International Conference on Computing: Theory and Applications, pages 545--551, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Sharma, R. Garg, and S. Chaudhury. Curvature feature distribution based classification of indian scripts from document images. In Proceedings of the International Workshop on Multilingual OCR, pages 3:1--3:6, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. L. Tan and P. O. Ng. Text extraction using pyramid. Pattern Recognition, 31:63--72, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Y. Tang, S.-W. Lee, and C. Y. Suen. Automatic document processing: A survey. Pattern Recognition, 29(12):1931--1952, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  27. K. Tombre, S. Tabbone, L. Pélissier, B. Lamiroy, and P. Dosch. Text/graphics separation revisited. In Proceedings of the 5th International Workshop on Document Analysis Systems V, pages 200--211, London, UK, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. M. Wahl, K. Y. Wong, and R. G. Casey. Block segmentation and text extraction in mixed text/image documents. In Computer Graphics and Image Processing, volume 20, pages 375--390, 1982.Google ScholarGoogle Scholar
  29. D. Wang and S. N. Srihari. Classification of newspaper image blocks using texture analysis. Computer Vision, Graphics, and Image Processing, 47(3):327--352, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Text graphic separation in Indian newspapers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        MOCR '13: Proceedings of the 4th International Workshop on Multilingual OCR
        August 2013
        99 pages
        ISBN:9781450321143
        DOI:10.1145/2505377

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MOCR '13 Paper Acceptance Rate17of34submissions,50%Overall Acceptance Rate17of34submissions,50%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader