Abstract
A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.
Similar content being viewed by others
References
Aguilar-Ruiz J.S., Riquelme J.C., Toro M. (2003). Evolutionary learning of hierarchical decision rules. IEEE Trans. Syst. Man Cybern. B. 33(2): 324–331
Bhattacharya, U., Chaudhuri, B.B.: Databases for research on recognition of handwritten characters of Indian scripts. In: The 8th International Conference on Document Analysis and Recognition, Seoul, pp. 789–793 (2005)
Casey R.G., Lecolinet E. (1996). A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7): 690–706
Fu S., Chen Y., Smith S., Iversen S., Matthews P.M. (2002). Effects of word form on brain processing of written Chinese. Neuroimage 17(3): 1538–1548
Ge, Y., Huo, Q.: A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters. In: The 16th International Conference on Pattern Recognition, Quebec, pp. 85–88 (2002)
General Administration of Technology of the People’s Republic of China: Code of Chinese Graphic Character Set for Information Interchange—Primary Set. Standard Press of China, Beijing (1980) (in Chinese)
Guillevic D., Suen C.Y. (1998). Recognition of legal amounts on bank cheques. Pattern Anal. Appl. 1(1): 28–41
Highleyman W. (1961). An analog method for character recognition. IRE Trans. Electron. Comput. EC 10: 502–512
Hull J. (1994). A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5): 550–554
Kavallieratou, E., Liolios, N., Koutsogeorgos, E., Fakotakis, N., Kokkinakis, G.: The GRUHD database of Greek unconstrained handwriting. In: The 6th International Conference on Document Analysis and Recognition, Seattle, pp. 561–565 (2001)
Kim D.-H., Hwang Y.-S., Park S.-T., Kim E.-J., S.-H, P., Bang S.-Y. (1996). Handwritten Korean character image database PE92. IEICE Trans. Inf. Syst. E 79-D(7): 943–950
Kim G., Govindaraju V., Srihari S.N. (1999). An architecture for handwritten text recognition systems. Int. J. Doc. Anal. Recognit. 2(1): 37–44
Liang Z., Shi P. (2005). A metasynthetic approach for segmenting handwritten Chinese character strings. Pattern Recognit. Lett. 26: 1498–1511
Liu C.-L., Koga M., Fujisawa H. (2002). Lexicon–driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Trans. Pattern Anal. Mach. Intell. 24(11): 1425–1437
Liu, Y.J., Tai, J.W., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: Proceedings of the International Conference on Chinese Computing and Orient Language Processing, Changsha, pp. 94–97 (1989)
Marti, U.V., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: The 5th International Conference on Document Analysis and Recognition, Bangalore, pp. 705–708 (1999)
Marti U., Bunke H. (2002). The IAM-database: an English sentence database for off-line handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1): 39–46
Ministry of Education: Educational Statistics Yearbook of 1998. People’s Education Press, Beijing (1998) (in Chinese)
Mori S., Yamamoto K., Yamada H., Saito T. (1979). On a handprinted kyoiku-kanji character data base. Bull. Electrotech. Lab. 43(11–12): 752–773
Munson, J.H.: Experiments in the recognition of hand-printed text: Part I-character recognition. In: Proceedings of Fall Joint Computer Conference. Thompson Books, Washington, DC, December 1968 pp. 1125–1138 (1968)
National Bureau of Statistics of China: China Statistical Yearbook 2004. China Statistics Press, Beijing (2005)
Otsu N. (1979). A threshold selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC 9(1): 62–66
Park, J.S., Kang, H.J., Lee, S.W.: Automatic quality measurement of gray-scale handwriting based on extended average entropy. In: The 15th International Conference on Pattern Recognition, Barcelona, pp. 426–429 (2000)
Saito T., Yamada H., Yamamoto K. (1985). On the data base ETL9 of handprinted characters in JIS Chinese characters and its analysis. IEICE Trans. J. 68(D4): 757–764
Sayre K. (1973). Machine recognition of handwritten words: a project report. Pattern Recognit. 5(3): 213–228
Senior A.W., Robinson A.J. (1998). An off-line cursive handwriting recognition system. IEEE Trans. Pattern Anal. Mach. Intell. 20(3): 309–321
Su, T., Zhang, T., Guan, D.: HIT–MW dataset for offline Chinese handwritten text recognition. In: The 10th International Workshop on Frontiers in Handwriting Recognition. (2006)
Suen C.Y., Berthod M., Mori S. (1980). Automatic recognition of handprinted characters—the state of the art. Proc. IEEE 68(4): 469–487
Suen, C.Y., Mori, S., Kim, S.H., Leung, C.H.: Analysis and recognition of Asian scripts—the state of the art. In: The 7th International Conference on Document Analysis and Recognition, Edinburgh, pp. 866–878 (2003)
Suen C.Y., Nadal C., Legault R., Mai T.A., Lam L. (1992). Computer recognition of unconstrained handwritten numerals. Proc. IEEE 80(7): 1162–1180
Tang Y.Y., Tu L.-T., Liu J., Lee S.-W., Lin W.-W. (1998). Off-line recognition of Chinese handwriting by multifeature andmultilevel classification. IEEE Trans. Pattern Anal. Mach. Intell. 20(5): 556–561
Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE on/off (IRONOFF) dual handwriting database. In: The 5th International Conference on Document Analysis and Recognition, Bangalore, pp. 455–458 (1999)
Vinciarelli A., Bengio S., Bunke H. (2004). Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6): 709–720
Wu, Y., Ding, X.: Character Recongition—Theory, Method and Implementation. Higher Education Press, Beijing (1992) (in Chinese)
Yacoubi M.E., Gilloux M., Bertille J.M. (2002). A statistical approach for phrase location and recognition within a text line: an application to street name recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(2): 172–188
Zhang, H., Guo, J.: Introduction to HCL2000 database. In: Proceedings of Sino-Japan Symposium on Intelligent Information Networks, Beijing (2000)
Zimmermann, M., Bunke, H.: N–gram language models for offline handwritten text recognition. In: The 9th International Workshop on Frontiers in Handwriting Recognition, Tokyo, pp. 203–208 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, T., Zhang, T. & Guan, D. Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. IJDAR 10, 27–38 (2007). https://doi.org/10.1007/s10032-006-0037-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-006-0037-6