skip to main content
10.1145/2467696.2467704acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

LSH-based large scale chinese calligraphic character recognition

Published:22 July 2013Publication History

ABSTRACT

Chinese calligraphy is the art of handwriting and is an important part of Chinese traditional culture. But due to the complexity of shape and styles of calligraphic characters, it is difficult for com-mon people to recognize them. So it would be great if a tool is provided to help users to recognize the unknown calligraphic characters. But the well-known OCR (Optical Character Recogni-tion) technology can hardly help people to recognize the unknown characters because of their deformation and complexity. Numerous collections of historical Chinese calligraphic works are digitized and stored in CADAL (China Academic Digital Associate Library) calligraphic system [1], and a huge database CCD (Calligraphic Character Dictionary) is built, which contains character images labeled with semantic meaning. In this paper, a LSH-based large scale Chinese calligraphic character recognition method is proposed basing on CCD. In our method, GIST descriptor is used to represent the global features of the calligraphic character images, LSH (Locality-sensitive hashing) is used to search CCD to find the similar character images to the recognized calligraphic character image. The recognition is based on the semantic probability which is computed according to the ranks of retrieved images and their distances to the recognized image in the Gist feature space. Our experiments show that our method is effective and efficient for recognizing Chinese calligraphic character image.

References

  1. CADAL calligraphic system web site: http://www.cadal.zju.edu.cn/Calligraphy/.Google ScholarGoogle Scholar
  2. K. Yu, J. Wu, and Y. Zhuang. Skeleton-Based Recognition of Chinese Calligraphic Character Image. In Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing, page 228--237. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal in Computer Vision, 42:145--175, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T.M. Rath, S. Kane, A. Lehman, E. Partridge and R. Man-matha, Indexing for a Digital Library of George Washington's Manuscripts: A Study of Word Matching Techniques. CIIR Technical Report, 2002.Google ScholarGoogle Scholar
  6. Itay Bar Yosef, Klara Kedem, Its' hak Dinstein, Malachi Beit-Arie, Edna Engel: Classification of Hebrew Calligraphic Handwriting Styles: Preliminary Results. In Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL' 04), pages 299--305, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daming Shi, Robert I. Damper, Steve R. Gunn. Offline handwritten Chinese character recognition by radical de-composition. ACM Transactions on Asian Language Infor-mation Processing (TALIP), pages 27--48, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Doermann, E. Rivlin, and I. Weiss. Applying algebraic and differential invariants for logo recognition. Machine Vision and Applications, pages 73--86, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Hayes and A. Efros. Scene completion using millions of photographs. In SIGGRAPH, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Douze, H. Jégou, H. sandhawalia, L. Amsaleg and C. Schmid, Evaluation of Gist descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Indyk and R. Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Proceedings of the Symposium on Theory of Computing, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th In-ternational Conference on Very Large Data Bases, pages 518--529. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Buhler. Provably sensitive indexing strategies for biose-quence similarity search. Proceedings of the Annual Inter-national Conference on Computational Molecular Biology (RECOMB02), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Buhler and M. Tompa. Fnding motifs using random pro-jections. Proceedings of the Annual International Confer-ence on Computational Molecular Biology (RECOMB01), 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding interesting associations without support pruning. Proceedings of the 16th International Conference on Data Engineering (ICDE), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification exam-ple. Proceedings of the 9th International Conference on Computer Vision, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB Workshop, 2000.Google ScholarGoogle Scholar
  19. Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of collections of files. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Shivakumar. Detecting digital copyright violations on the Internet (Ph.D. thesis). Department of Computer Science, Stanford University, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Yang. Macs: Music audio characteristic sequence indexing for similarity retrieval. Proceedings of the Workshop on Ap-plications of Signal Processing to Audio and Acoustics, 2001.Google ScholarGoogle Scholar
  22. M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LSH-based large scale chinese calligraphic character recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
      July 2013
      480 pages
      ISBN:9781450320771
      DOI:10.1145/2467696

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      JCDL '13 Paper Acceptance Rate28of95submissions,29%Overall Acceptance Rate415of1,482submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader