Abstract
It is difficult to segment mixed Chinese/English documents when there are many italic characters scattered in documents. Most contributions attach more attention to English documents. However, mixed document is different from English document and some special features should be considered. This paper gives a new way to solve the problem. At first, an appropriate character area is chosen to detect italic. Next, a two-step strategy is adopted. Italic determination is done first and then if the character pattern is identified as italic, the estimation of slant angle will be done. Finally the italic character pattern is corrected by shear transform. A method of adopting two-step weighted projection profile histogram for italic determination is introduced. And a fast algorithm to estimate slant angle is also introduced. Three large sample collections, including character and character-pair and document respectively, are provided to evaluate our method and encouraging results are achieved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ding, Y.M., Okada, M., Kimura, F., Miyake, Y.: Application of Slant Correction to Handwritten Japanese Address Recognition. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 670–674 (2001)
Ding, Y.M., Kimura, F., Miyake, Y., Shridhar, M.: Slant estimation for handwritten words by directionally refined chain code. In: Proceedings of the Seventh International Workshop on Frontiers in Handwritten Recognition, pp. 53–62 (2000)
Ding, Y.M., Ohyama, W., Kimura, F., Shridhar, M.: Local slant estimation for handwritten English words. In: Proceedings of the Ninth International Workshop on Frontiers in Handwritten Recognition, Kokubunji, Tokyo, Japan, pp. 328–333 (2004)
Simoncini, L., Kovacs-V, Z.M.: A system for reading USA census ’90 hand-written fields. In: Proceedings of the Third International Conference on Document Analysis and Recognition, Montreal, vol. 1, pp. 86–91 (1995)
Nicchiotti, G., Scagliola, C.: Generalised projections: a tool for cursive character normalization. In: Proceedings of Fifth International Conference on Document Analysis and Recognition, Bangalore (1999)
Fan, K.C., Huang, C.H., Chuang, T.C.: Italic Detection and Rectification. In: Proceedings of 2005 International Conference on Image Processing, vol. 2, pp. 530–533 (2005)
Li, Y., Naoi, S., Cheriet, M., Suen, C.Y.: A segmentation method for touching italic characters. In: Proceedings of Seventeenth International Conference on Pattern Recognition, vol. 2, pp. 594–597 (2004)
Su, L.: Restoration and segmentation of machine printed documents, Ph.D dissertation, University of Windsor, Canada, pp. 92–95 (1996)
Sun, C.M., Si, D.: Skew and slant correction for document images using gradient direction. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 142–146 (1997)
Ballesteros, J., Travieso, C.M., Alonso, J.B., Ferrer, M.A.: Slant estimation of handwritten characters by means of Zernike moments. Electronics Letters 41(20), 1110–1112 (2005)
Chaudhuri, B.B., Garain, U.: Automatic detection of italic bold and all-capital words in document images. In: Proceedings of Fourteenth International Conference on Pattern Recognition, vol. 1, pp. 610–612 (1998)
Kavallieratou, E., Fakotakis, N., Kokkinakis, G.: Slant estimation algorithm for OCR system. Pattern Recognition 34(12), 2515–2522 (2001)
Zhang, L., Lu, Y., Tan, C.L.: Italic font recognition using stroke pattern analysis on wavelet decomposed word images. In: Proceedings of Seventeenth International Conference on Pattern Recognition, vol. 4, pp. 835–838 (2004)
Bozinovic, R.M., Srihari, S.N.: Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(1), 68–83 (1989)
Xia, Y., Wang, C.H., Dai, R.W.: Segmentation of mixed Chinese/English document based on AFMPF model. Acta Automatica Sinica 32(3), 353–359 (2006)
Xia, Y., Xiao, B.H., Wang, C.H., Li, Y.D.: Segmentation of mixed Chinese/English documents based on Chinese Radicals recognition and complexity analysis in local segment pattern. Lecture Notes in Control and Information Sciences, vol. 345, pp. 497–506. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xia, Y., Wang, CH., Dai, RW. (2006). Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_2
Download citation
DOI: https://doi.org/10.1007/11940098_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)