Skip to main content
Log in

Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

  • ORIGINAL Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aguilar-Ruiz J.S., Riquelme J.C., Toro M. (2003). Evolutionary learning of hierarchical decision rules. IEEE Trans. Syst. Man Cybern. B. 33(2): 324–331

    Article  Google Scholar 

  2. Bhattacharya, U., Chaudhuri, B.B.: Databases for research on recognition of handwritten characters of Indian scripts. In: The 8th International Conference on Document Analysis and Recognition, Seoul, pp. 789–793 (2005)

  3. Casey R.G., Lecolinet E. (1996). A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7): 690–706

    Article  Google Scholar 

  4. Fu S., Chen Y., Smith S., Iversen S., Matthews P.M. (2002). Effects of word form on brain processing of written Chinese. Neuroimage 17(3): 1538–1548

    Article  Google Scholar 

  5. Ge, Y., Huo, Q.: A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters. In: The 16th International Conference on Pattern Recognition, Quebec, pp. 85–88 (2002)

  6. General Administration of Technology of the People’s Republic of China: Code of Chinese Graphic Character Set for Information Interchange—Primary Set. Standard Press of China, Beijing (1980) (in Chinese)

  7. Guillevic D., Suen C.Y. (1998). Recognition of legal amounts on bank cheques. Pattern Anal. Appl. 1(1): 28–41

    Article  Google Scholar 

  8. Highleyman W. (1961). An analog method for character recognition. IRE Trans. Electron. Comput. EC 10: 502–512

    Article  Google Scholar 

  9. Hull J. (1994). A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5): 550–554

    Article  Google Scholar 

  10. Kavallieratou, E., Liolios, N., Koutsogeorgos, E., Fakotakis, N., Kokkinakis, G.: The GRUHD database of Greek unconstrained handwriting. In: The 6th International Conference on Document Analysis and Recognition, Seattle, pp. 561–565 (2001)

  11. Kim D.-H., Hwang Y.-S., Park S.-T., Kim E.-J., S.-H, P., Bang S.-Y. (1996). Handwritten Korean character image database PE92. IEICE Trans. Inf. Syst. E 79-D(7): 943–950

    Google Scholar 

  12. Kim G., Govindaraju V., Srihari S.N. (1999). An architecture for handwritten text recognition systems. Int. J. Doc. Anal. Recognit. 2(1): 37–44

    Article  Google Scholar 

  13. Liang Z., Shi P. (2005). A metasynthetic approach for segmenting handwritten Chinese character strings. Pattern Recognit. Lett. 26: 1498–1511

    Article  Google Scholar 

  14. Liu C.-L., Koga M., Fujisawa H. (2002). Lexicon–driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Trans. Pattern Anal. Mach. Intell. 24(11): 1425–1437

    Article  Google Scholar 

  15. Liu, Y.J., Tai, J.W., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: Proceedings of the International Conference on Chinese Computing and Orient Language Processing, Changsha, pp. 94–97 (1989)

  16. Marti, U.V., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: The 5th International Conference on Document Analysis and Recognition, Bangalore, pp. 705–708 (1999)

  17. Marti U., Bunke H. (2002). The IAM-database: an English sentence database for off-line handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1): 39–46

    Article  MATH  Google Scholar 

  18. Ministry of Education: Educational Statistics Yearbook of 1998. People’s Education Press, Beijing (1998) (in Chinese)

  19. Mori S., Yamamoto K., Yamada H., Saito T. (1979). On a handprinted kyoiku-kanji character data base. Bull. Electrotech. Lab. 43(11–12): 752–773

    Google Scholar 

  20. Munson, J.H.: Experiments in the recognition of hand-printed text: Part I-character recognition. In: Proceedings of Fall Joint Computer Conference. Thompson Books, Washington, DC, December 1968 pp. 1125–1138 (1968)

  21. National Bureau of Statistics of China: China Statistical Yearbook 2004. China Statistics Press, Beijing (2005)

  22. Otsu N. (1979). A threshold selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC 9(1): 62–66

    MathSciNet  Google Scholar 

  23. Park, J.S., Kang, H.J., Lee, S.W.: Automatic quality measurement of gray-scale handwriting based on extended average entropy. In: The 15th International Conference on Pattern Recognition, Barcelona, pp. 426–429 (2000)

  24. Saito T., Yamada H., Yamamoto K. (1985). On the data base ETL9 of handprinted characters in JIS Chinese characters and its analysis. IEICE Trans. J. 68(D4): 757–764

    Google Scholar 

  25. Sayre K. (1973). Machine recognition of handwritten words: a project report. Pattern Recognit. 5(3): 213–228

    Article  Google Scholar 

  26. Senior A.W., Robinson A.J. (1998). An off-line cursive handwriting recognition system. IEEE Trans. Pattern Anal. Mach. Intell. 20(3): 309–321

    Article  Google Scholar 

  27. Su, T., Zhang, T., Guan, D.: HIT–MW dataset for offline Chinese handwritten text recognition. In: The 10th International Workshop on Frontiers in Handwriting Recognition. (2006)

  28. Suen C.Y., Berthod M., Mori S. (1980). Automatic recognition of handprinted characters—the state of the art. Proc. IEEE 68(4): 469–487

    Article  Google Scholar 

  29. Suen, C.Y., Mori, S., Kim, S.H., Leung, C.H.: Analysis and recognition of Asian scripts—the state of the art. In: The 7th International Conference on Document Analysis and Recognition, Edinburgh, pp. 866–878 (2003)

  30. Suen C.Y., Nadal C., Legault R., Mai T.A., Lam L. (1992). Computer recognition of unconstrained handwritten numerals. Proc. IEEE 80(7): 1162–1180

    Article  Google Scholar 

  31. Tang Y.Y., Tu L.-T., Liu J., Lee S.-W., Lin W.-W. (1998). Off-line recognition of Chinese handwriting by multifeature andmultilevel classification. IEEE Trans. Pattern Anal. Mach. Intell. 20(5): 556–561

    Article  Google Scholar 

  32. Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE on/off (IRONOFF) dual handwriting database. In: The 5th International Conference on Document Analysis and Recognition, Bangalore, pp. 455–458 (1999)

  33. Vinciarelli A., Bengio S., Bunke H. (2004). Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6): 709–720

    Article  Google Scholar 

  34. Wu, Y., Ding, X.: Character Recongition—Theory, Method and Implementation. Higher Education Press, Beijing (1992) (in Chinese)

  35. Yacoubi M.E., Gilloux M., Bertille J.M. (2002). A statistical approach for phrase location and recognition within a text line: an application to street name recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(2): 172–188

    Article  Google Scholar 

  36. Zhang, H., Guo, J.: Introduction to HCL2000 database. In: Proceedings of Sino-Japan Symposium on Intelligent Information Networks, Beijing (2000)

  37. Zimmermann, M., Bunke, H.: N–gram language models for offline handwritten text recognition. In: The 9th International Workshop on Frontiers in Handwriting Recognition, Tokyo, pp. 203–208 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tonghua Su.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, T., Zhang, T. & Guan, D. Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. IJDAR 10, 27–38 (2007). https://doi.org/10.1007/s10032-006-0037-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-006-0037-6

Keywords

Navigation