Skip to main content

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 131))

  • 2935 Accesses

Abstract

The text in stylistic documents may have different orientations; the text lines may be curved in shape and they also may not be parallel to each other within a page. As a result, extraction and subsequent recognition of individual text lines and words in such documents is a difficult task. Thinning is one of the most crucial phases in the process of text recognition of characters to a single pixel notation and its success lies in its property to retain the original character shape. Thinning algorithms pose problems due to presence of distinct non-isolated boundaries and complex character shapes in different scripts and produce unwanted edges. This paper presents an improved thinning algorithm which does not produce unwanted edges to get the path of the text for the development of curved straightening system of Optical Character Recognition (OCR). When experimented on documents with either English or Hindi curved text, visual inspection of the results show that proposed method yields promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Marinai, S.: Introduction to document analysis and recognition. SCI, vol. 90, pp. 1–20 (2008)

    Chapter  Google Scholar 

  2. Tang, C., Suen, Y., Yan, C.D., Cheriet, M.: Document analysis and understanding: a brief survey. In: Proceeding of 1st Int. Conf. on Document Analysis and Recognition, Saint-Malo, France, pp. 17–31 (October 1991)

    Google Scholar 

  3. Plamondon, R., Srihari, S.N.: On-line and off-line handwritten recognition: a comprehensive survey. IEEE Trans. on PAMI 22, 62–84 (2000)

    Article  Google Scholar 

  4. Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25, 10–22 (1992)

    Article  Google Scholar 

  5. Pal, U., Tripathy, N.: Multi-oriented and curved text lines extraction from Indian documents. IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics 34(4), 1676–1684 (2004)

    Article  Google Scholar 

  6. Roy, P.P., Pal, U., Lladós, J., Kimura, F.: Convex hull based approach for multioriented character recognition from graphical documents. In: Proceeding of ICPR, pp. 1–4. IEEE (2008)

    Google Scholar 

  7. Goto, H., Aso, H.: Extracting curved lines using local linearity of the text line. Int. J. Doc. Anal. Recognit. 2, 111–118 (1999)

    Article  Google Scholar 

  8. Gonzalez, R.C., Woods, R.E.: Digital image processing (DIP/3e), 3rd edn. Pearson Education, Asia

    Google Scholar 

  9. Arcelli, C.: A condition for digital points removal. Signal Processing 1(4), 283–285 (1974)

    Article  MathSciNet  Google Scholar 

  10. Arcelli, C., Sanniti di Baja, G.: Medial lines and figure analysis. In: Proceeding of 5th Int. Conf. on Pattern Recognition, pp. 1016–1018 (1980)

    Google Scholar 

  11. Lam, L., Lee, S.W., Suen, S.Y.: Thinning methodologies-a comprehensive survey. IEEE Trans. PAMI, 869–885 (1992)

    Article  Google Scholar 

  12. Arcelli, C.: Pattern thinning by contour tracing. Comput. Vision Graphics Image Process. 17, 130–144 (1981)

    Article  Google Scholar 

  13. Latecki, L., Ma, C.M.: An algorithm for a 3D simplicity test. Computer Vision and Image Understanding 63, 388–393 (1996)

    Article  Google Scholar 

  14. Eckhardt, U., Maderlechner, G.: Thinning of binary images. Hamb. Beitr. Angew. Math. B 11 (April 1989)

    Google Scholar 

  15. Heijmans, H.J.A.M., Ronse, C.: The algebraic basis of mathematical morphology. Part I. Dilations and Erosions, Comput. Vision Graphics Image Process 50, 245–295 (1990)

    Article  Google Scholar 

  16. Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Comput. Vision Graphics Image Process. 48, 357–393 (1989)

    Article  Google Scholar 

  17. Naccache, N.J., Shinghal, R.: SPTA: A proposed algorithm for thinning binary patterns. IEEE Trans. Systems Man Cybernet SMC 14, 409–418 (1984)

    Article  Google Scholar 

  18. Tanura, H.: A comparison of line thinning algorithm from a digital geometry viewpoint. In: Proceeding of 6th Int. Conf. of Pattern Recognition, pp. 715–719 (1978)

    Google Scholar 

  19. Arcelli, C., Sanniti di Baja, G.: Text recognition. Signal Processing 41, 49–76 (1995)

    Article  Google Scholar 

  20. Huang, L., Wan, G., Liu, C.: An improved parallel thinning algorithm. In: Proceedings of the 7th Int. Conf. on Doc. Ana. and Rec., vol. 2, pp. 780–786 (2003)

    Google Scholar 

  21. Cowell, J., Fiaz, H.: Thinning Arabic characters for feature extraction. In: IEEE Proceedings of 5th Int. Conf. on Information Visualization, pp. 181–187 (2001)

    Google Scholar 

  22. Shaikh, N.A., Shaikh, Z.A.: Delimiting factors in the automation of Sindhi language. Internal Technical report submitted to National University of Computer and Emerging Sciences, Karachi (March 2004)

    Google Scholar 

  23. Kavianafar, M., Amin, A.: Pre-processing and structural feature extraction for multi fonts Arabic/ Persian OCR. In: Proceedings of 5th Int. Conf. on Doc. Ana. and Rec., pp. 213–220 (1999)

    Google Scholar 

  24. Shaikh, N.A., Shaikh, Z.A.: A comparative analysis on the applications of various thinning algorithms on Arabic scripting languages. Technical report submitted to National University of Computer and Emerging Sciences, Karachi (December 2004)

    Google Scholar 

  25. Kanungo, T., Haralick, R.M.: Character recognition using mathematical morphology. In: Proceedings of USPS 4th Advanced Technology Conference, Washington, D.C., pp. 973–986 (1990)

    Google Scholar 

  26. Chaudhuri, B.B., Majumdar, A.: Curvelet–based multi SVM recognizer for offline handwritten Bangla: A major Indian script. In: Proceeding of Int. Conf. on Doc. Ana. and Rec. ICDAR, pp. 491–495 (2007)

    Google Scholar 

  27. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems Man Cybernet. 9(1), 62–66 (1979)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brijmohan Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer India Pvt. Ltd.

About this paper

Cite this paper

Singh, B., Goswami, S., Goyal, P., Mittal, A. (2012). A Robust Thinning Algorithm for Straightening of Curved Text Line. In: Deep, K., Nagar, A., Pant, M., Bansal, J. (eds) Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011. Advances in Intelligent and Soft Computing, vol 131. Springer, New Delhi. https://doi.org/10.1007/978-81-322-0491-6_83

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-0491-6_83

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-0490-9

  • Online ISBN: 978-81-322-0491-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics