ABSTRACT
Chinese calligraphy is the art of handwriting and is an important part of Chinese traditional culture. But due to the complexity of shape and styles of calligraphic characters, it is difficult for com-mon people to recognize them. So it would be great if a tool is provided to help users to recognize the unknown calligraphic characters. But the well-known OCR (Optical Character Recogni-tion) technology can hardly help people to recognize the unknown characters because of their deformation and complexity. Numerous collections of historical Chinese calligraphic works are digitized and stored in CADAL (China Academic Digital Associate Library) calligraphic system [1], and a huge database CCD (Calligraphic Character Dictionary) is built, which contains character images labeled with semantic meaning. In this paper, a LSH-based large scale Chinese calligraphic character recognition method is proposed basing on CCD. In our method, GIST descriptor is used to represent the global features of the calligraphic character images, LSH (Locality-sensitive hashing) is used to search CCD to find the similar character images to the recognized calligraphic character image. The recognition is based on the semantic probability which is computed according to the ranks of retrieved images and their distances to the recognized image in the Gist feature space. Our experiments show that our method is effective and efficient for recognizing Chinese calligraphic character image.
- CADAL calligraphic system web site: http://www.cadal.zju.edu.cn/Calligraphy/.Google Scholar
- K. Yu, J. Wu, and Y. Zhuang. Skeleton-Based Recognition of Chinese Calligraphic Character Image. In Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing, page 228--237. Springer, 2008. Google ScholarDigital Library
- A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal in Computer Vision, 42:145--175, 2001. Google ScholarDigital Library
- Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarDigital Library
- T.M. Rath, S. Kane, A. Lehman, E. Partridge and R. Man-matha, Indexing for a Digital Library of George Washington's Manuscripts: A Study of Word Matching Techniques. CIIR Technical Report, 2002.Google Scholar
- Itay Bar Yosef, Klara Kedem, Its' hak Dinstein, Malachi Beit-Arie, Edna Engel: Classification of Hebrew Calligraphic Handwriting Styles: Preliminary Results. In Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL' 04), pages 299--305, 2004. Google ScholarDigital Library
- Daming Shi, Robert I. Damper, Steve R. Gunn. Offline handwritten Chinese character recognition by radical de-composition. ACM Transactions on Asian Language Infor-mation Processing (TALIP), pages 27--48, 2003 Google ScholarDigital Library
- D. Doermann, E. Rivlin, and I. Weiss. Applying algebraic and differential invariants for logo recognition. Machine Vision and Applications, pages 73--86, 1996. Google ScholarDigital Library
- J. Hayes and A. Efros. Scene completion using millions of photographs. In SIGGRAPH, 2007 Google ScholarDigital Library
- M. Douze, H. Jégou, H. sandhawalia, L. Amsaleg and C. Schmid, Evaluation of Gist descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Proceedings of the Symposium on Theory of Computing, 1998. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th In-ternational Conference on Very Large Data Bases, pages 518--529. Morgan Kaufmann, 1999. Google ScholarDigital Library
- J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.Google ScholarCross Ref
- J. Buhler. Provably sensitive indexing strategies for biose-quence similarity search. Proceedings of the Annual Inter-national Conference on Computational Molecular Biology (RECOMB02), 2002. Google ScholarDigital Library
- J. Buhler and M. Tompa. Fnding motifs using random pro-jections. Proceedings of the Annual International Confer-ence on Computational Molecular Biology (RECOMB01), 2001 Google ScholarDigital Library
- E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding interesting associations without support pruning. Proceedings of the 16th International Conference on Data Engineering (ICDE), 2000. Google ScholarDigital Library
- B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification exam-ple. Proceedings of the 9th International Conference on Computer Vision, 2003. Google ScholarDigital Library
- T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB Workshop, 2000.Google Scholar
- Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of collections of files. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 2002. Google ScholarDigital Library
- N. Shivakumar. Detecting digital copyright violations on the Internet (Ph.D. thesis). Department of Computer Science, Stanford University, 2000. Google ScholarDigital Library
- C. Yang. Macs: Music audio characteristic sequence indexing for similarity retrieval. Proceedings of the Workshop on Ap-plications of Signal Processing to Audio and Acoustics, 2001.Google Scholar
- M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarDigital Library
Index Terms
- LSH-based large scale chinese calligraphic character recognition
Recommendations
Character and numeral recognition for non-Indic and Indic scripts: a survey
AbstractA collection of different scripts is employed in writing languages throughout the world. Character and numeral recognition of a particular script is a key area in the field of pattern recognition. In this paper, we have presented a comprehensive ...
Fast Chinese calligraphic character recognition with large-scale data
Chinese calligraphy draws a lot of attention for its beauty and elegance. But due to the complexity of shape and styles of calligraphic characters, it is difficult for common users to recognize them. Thus it would be great if a tool is provided to help ...
Skeleton-Based Recognition of Chinese Calligraphic Character Image
PCM '08: Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information ProcessingThe large amount of digitized Chinese calligraphic works in existence is a valuable part of the Chinese cultural heritage. But they can hardly be recognized by optical character recognition (OCR) which performs well on machine printed characters against ...
Comments