Skip to main content

A Restoration and Segmentation Unit for the Historic Persian Documents

  • Conference paper
Advanced Concepts for Intelligent Vision Systems (ACIVS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3708))

Abstract

This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. West, E.W.: Pahlavi Texts, 5 (1860)

    Google Scholar 

  2. Jain, A.K., Yu, B.: Document Representation and Its Application to Page Decomposition. IEEE Trans. on Pattern Analysis and Machine Intelligence 20(3), 294–308 (1998)

    Article  Google Scholar 

  3. Casey, R.G., Lecolinet, E.: A Survey of Methods and Strategies in Character Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 18(7) (1996)

    Google Scholar 

  4. Plamondon, R., Srihari, S.N.: On-line and Off-line handwriting Recognition,A Comprehensive Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1) (2000)

    Google Scholar 

  5. Arica, N., Yarman-Vural, F.T.: An Overview of Character Recognition Focused on Off-Line Handwriting. IEEE Trans. on Sys., Man., and Cybernetics 31(2) (2001)

    Google Scholar 

  6. Sahoo, P.K., Soltani, S., Wong, A.K.C., Chen, Y.C.: A survey of thresholding techniques. Computer Vision, Graphics, and Image Processing 41, 233–260 (1998)

    Article  Google Scholar 

  7. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn (2002)

    Google Scholar 

  8. Giardina, C.R., Dougherty, E.R.: Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cliffs (1988)

    Google Scholar 

  9. Mohaderan, U., Nagabhushanam, R.C.: Gap metrics for word separation handwritten lines. In: ICDAR, pp. 124–127 (1995)

    Google Scholar 

  10. Seni, G., Cohen, E.: External word segmentation of off-line handwritten text lines. Pattern Recognition 27(1), 41–52 (1994)

    Article  Google Scholar 

  11. Ha, J., Haralick, R., Phillips, I.: Document Page Decomposi-tion by the Bounding-Box Projection Technique. In: ICDAR, pp. 119–122 (1995)

    Google Scholar 

  12. Schomaker, L., Bulacu, M.: Automatic Writer Identification Using Connected-Component Contours and Edge-Based Features of Uppercase Western Script. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(6) (2004)

    Google Scholar 

  13. Likas, A., Valassis, N., Verbeek, J.J.: The global k_means algorithm. Pattern Recognition 36, 451–461 (2003)

    Article  Google Scholar 

  14. Schomaker, L., Bulacu, M.: Automatic Writer Identification Using Connected-Component Contours and Edge-Based Features of Uppercase Western Script. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(6) (2004)

    Google Scholar 

  15. Pahlavy Handwritten Documents, Asian Institute of Shiraz University (1972)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alirezaee, S., Fard, A.S., Aghaeinia, H., Faez, K. (2005). A Restoration and Segmentation Unit for the Historic Persian Documents. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2005. Lecture Notes in Computer Science, vol 3708. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558484_85

Download citation

  • DOI: https://doi.org/10.1007/11558484_85

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29032-2

  • Online ISBN: 978-3-540-32046-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics