Skip to main content

Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

Abstract

It is difficult to segment mixed Chinese/English documents when there are many italic characters scattered in documents. Most contributions attach more attention to English documents. However, mixed document is different from English document and some special features should be considered. This paper gives a new way to solve the problem. At first, an appropriate character area is chosen to detect italic. Next, a two-step strategy is adopted. Italic determination is done first and then if the character pattern is identified as italic, the estimation of slant angle will be done. Finally the italic character pattern is corrected by shear transform. A method of adopting two-step weighted projection profile histogram for italic determination is introduced. And a fast algorithm to estimate slant angle is also introduced. Three large sample collections, including character and character-pair and document respectively, are provided to evaluate our method and encouraging results are achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ding, Y.M., Okada, M., Kimura, F., Miyake, Y.: Application of Slant Correction to Handwritten Japanese Address Recognition. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 670–674 (2001)

    Google Scholar 

  2. Ding, Y.M., Kimura, F., Miyake, Y., Shridhar, M.: Slant estimation for handwritten words by directionally refined chain code. In: Proceedings of the Seventh International Workshop on Frontiers in Handwritten Recognition, pp. 53–62 (2000)

    Google Scholar 

  3. Ding, Y.M., Ohyama, W., Kimura, F., Shridhar, M.: Local slant estimation for handwritten English words. In: Proceedings of the Ninth International Workshop on Frontiers in Handwritten Recognition, Kokubunji, Tokyo, Japan, pp. 328–333 (2004)

    Google Scholar 

  4. Simoncini, L., Kovacs-V, Z.M.: A system for reading USA census ’90 hand-written fields. In: Proceedings of the Third International Conference on Document Analysis and Recognition, Montreal, vol. 1, pp. 86–91 (1995)

    Google Scholar 

  5. Nicchiotti, G., Scagliola, C.: Generalised projections: a tool for cursive character normalization. In: Proceedings of Fifth International Conference on Document Analysis and Recognition, Bangalore (1999)

    Google Scholar 

  6. Fan, K.C., Huang, C.H., Chuang, T.C.: Italic Detection and Rectification. In: Proceedings of 2005 International Conference on Image Processing, vol. 2, pp. 530–533 (2005)

    Google Scholar 

  7. Li, Y., Naoi, S., Cheriet, M., Suen, C.Y.: A segmentation method for touching italic characters. In: Proceedings of Seventeenth International Conference on Pattern Recognition, vol. 2, pp. 594–597 (2004)

    Google Scholar 

  8. Su, L.: Restoration and segmentation of machine printed documents, Ph.D dissertation, University of Windsor, Canada, pp. 92–95 (1996)

    Google Scholar 

  9. Sun, C.M., Si, D.: Skew and slant correction for document images using gradient direction. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 142–146 (1997)

    Google Scholar 

  10. Ballesteros, J., Travieso, C.M., Alonso, J.B., Ferrer, M.A.: Slant estimation of handwritten characters by means of Zernike moments. Electronics Letters 41(20), 1110–1112 (2005)

    Article  Google Scholar 

  11. Chaudhuri, B.B., Garain, U.: Automatic detection of italic bold and all-capital words in document images. In: Proceedings of Fourteenth International Conference on Pattern Recognition, vol. 1, pp. 610–612 (1998)

    Google Scholar 

  12. Kavallieratou, E., Fakotakis, N., Kokkinakis, G.: Slant estimation algorithm for OCR system. Pattern Recognition 34(12), 2515–2522 (2001)

    Article  MATH  Google Scholar 

  13. Zhang, L., Lu, Y., Tan, C.L.: Italic font recognition using stroke pattern analysis on wavelet decomposed word images. In: Proceedings of Seventeenth International Conference on Pattern Recognition, vol. 4, pp. 835–838 (2004)

    Google Scholar 

  14. Bozinovic, R.M., Srihari, S.N.: Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(1), 68–83 (1989)

    Article  Google Scholar 

  15. Xia, Y., Wang, C.H., Dai, R.W.: Segmentation of mixed Chinese/English document based on AFMPF model. Acta Automatica Sinica 32(3), 353–359 (2006)

    Google Scholar 

  16. Xia, Y., Xiao, B.H., Wang, C.H., Li, Y.D.: Segmentation of mixed Chinese/English documents based on Chinese Radicals recognition and complexity analysis in local segment pattern. Lecture Notes in Control and Information Sciences, vol. 345, pp. 497–506. Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xia, Y., Wang, CH., Dai, RW. (2006). Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_2

Download citation

  • DOI: https://doi.org/10.1007/11940098_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics