skip to main content
10.1145/2037342.2037365acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Development of Nom character segmentation for collecting patterns from historical document pages

Published:16 September 2011Publication History

ABSTRACT

In this paper, we present the first effort in preprocessing and character segmentation on digitized Nom document pages toward their digital archiving. Nom is an ideographic script to represent Vietnamese, used from the 10th century to 20th century. Because of various complex layouts, we propose an efficient method based on connected component analysis for extraction of characters from images. The area Voronoi diagram is then employed to represent the neighborhood and boundary of connected components. Based on this representation, each character can be considered as a group of extracted adjacent Voronoi regions. To improve the performance of segmentation, we use the recursive x-y cut method to segment separated regions. We evaluate the performance of this method on several pages in different layouts. The results confirm that the method is effective for character segmentation in Nom documents.

References

  1. V. J. Shih, T. L. Chu, "The Han Nom Digital Library,", in The International Nom Conference, The National Library of Vietnam, Hanoi, November 12--14, 2004.Google ScholarGoogle Scholar
  2. M. S. Kim, K. T. Cho, H. K. Kwag, J. H. Kim, "Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents," Document Analysis Systems 2004, 114--124.Google ScholarGoogle Scholar
  3. L. Y. Tseng, R. C. Chen, "Segmenting handwritten Chinese characters based on heuristic merging of stroke bounding boxes and dynamic programming," Pattern Recognition Letters 19(10), 1998, 963--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. H. Tseng, H. J. Lee, "Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Letters 20(8), 1999, 791--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Zhao, Z. Chi, P. Shi, H. Yan, "Two-stage segmentation of unconstrained handwritten Chinese characters," Pattern Recognition 36(1), 2003, 145--156.Google ScholarGoogle ScholarCross RefCross Ref
  6. K. Kise, A. Sato, M. Iwata, "Segmentation of page images using the area Voronoi diagram," Comput. Vis. Image Underst. 70(3), 1998, 370--382 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Lu, Z. Wang, C. L Tan, "Word grouping in document images based on Voronoi tessellation," In Marinai, S., Dengel, A., eds.: Document Analysis Systems. Volume 3163 of Lecture Notes in Computer Science., Springer, 2004, 147--157.Google ScholarGoogle Scholar
  8. B. Su, S. Lu, C. L Tan, "Binarization of historical handwritten document images using local maximum and minimum filter," International Workshop on Document Analysis Systems, June 2010, 159--165 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Kittler, J. Illingworth, "Threshold selection based on a simple image statistics," Comput. Vision Graphics Image Process.30, 1985, 125--147.Google ScholarGoogle ScholarCross RefCross Ref
  10. N. Otsu, "A threshold selection method from gray-level histograms," IEEE Trans. System, Man Cybernetics9, 1979, 62--66.Google ScholarGoogle Scholar
  11. W. Peerawit, A. Kawtrakul, "Marginal noise removal from document images using edge density," In: 4th Information and Computer Engineering Postgraduate Workshop, Phuket, Thailand, 2004.Google ScholarGoogle Scholar
  12. F. Chang, C. J. Chen, "A Fast Method for Labeling Connected Components in an image," IPPR Conference on Computer Vision, Graphics and Image Processing (CVGIP), 2003, 327--333.Google ScholarGoogle Scholar
  13. A. Okabe, B. Boots, K. Sugihara, "Spatial Tessellations. Concepts and Applications of Voronoi Diagrams," J. Wiley and Sons, Chichester, 1992, 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Development of Nom character segmentation for collecting patterns from historical document pages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HIP '11: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
      September 2011
      195 pages
      ISBN:9781450309165
      DOI:10.1145/2037342

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 September 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate52of90submissions,58%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader