Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

Abstract

Conversion from gray scale or color document image into binary image is the main step in most of Optical Character Recognition (OCR) systems and document analysis. After digitization, document images often suffer from poor contrast, noise, uniform lighting, and shadow. Also when a page of book is digitized using a scanner or a camera, a border noise, which is an unwanted text coming from the adjacent page, may appear. In this paper we present a simple and efficient document image clean up by border noise removal and enhancement based on retinex theory and global threshold. The proposed method produces high quality results compared to the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Chen and G. Leedham, “Decompose Algorithm for Thresholding Degraded Historical Document Images” IEEE Proceedings on Vision, Image and Signal Processing, vol. 152 No.6, pp. 702–714, 2005.

    Google Scholar 

  2. G. Agam, G. Bal, G. Frieder, and O. Frieder, “Degraded Document Image Enhancement” in Document Recognition and Retrieval XIV, Proc. SPIE, vol. 6500, pp. 65000C-1 - 65000C-11, 2007.

    Google Scholar 

  3. J. M. White and G. D. Rohrer, “Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction” IBM Journal of Research and Development vol. 27, No. 4, pp. 400-411, 1983.

    Google Scholar 

  4. L. Gorman “Binarization and Multithresholding of Document Image Using Connectivity” CVGIP, Graph. Models Image Processing, vol. 56, No. 6, pp. 496-506, 1994.

    Google Scholar 

  5. R. Cattoni, T. Coianiz, S. Messelodi, and CM Modena, “Geometric Layout Analysis Techniques for Document Image Understanding: a Review”, ITC-irst Technical Report 9703 (09), 1998.

    Google Scholar 

  6. P. Viola and M. J. Jones, “Robust Real-Time Face Detection,” Int. Journal of Computer Vision, vol. 57, No. 2, pp. 137– 154, 2004.

    Google Scholar 

  7. F. Shafait, D. Keysers, and T. M. Breuel, “Performance Comparison of Six Algorithms for Page Segmentation,” in 7th IAPR Workshop on Document Analysis Systems, pp. 368–379, 2006.

    Google Scholar 

  8. N. Otsu, “A Threshold Selection Method FromGray-Level Histograms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, No. 1, pp. 62–66, 1979.

    Google Scholar 

  9. Y. Solihin, and C. G. Leedham, “Integral Ratio: A New Class of Global Thresholding Techniques for Handwriting Images”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, No. 8, pp. 761 – 768, 1999.

    Google Scholar 

  10. W. Niblack “An Introduction to Digital Image Processing” Prentice-Hall, Englewood Cliffs, New Jersey, 1986.

    Google Scholar 

  11. J. Sauvola and M. Pietikainen, “Adaptive Document Image Binarization,” Proc. of Pattern Recognition, vol. 33, No. 2, pp. 225–236, 2000.

    Google Scholar 

  12. T.Romen “A New Local Adaptive Thresholding Technique in Binarization” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No. 2, pp. 271-277,2011.

    Google Scholar 

  13. J. G. Kuk, and N. I. Cho, “Feature Based Binarization of Document Images Degraded by Uneven Light Condition” in 10th inter. Conf. On Document Analysis and Recognition (ICDAR), pp. 748-752, 2009.

    Google Scholar 

  14. I. K. Kim, D. W. Jung, and R. H. Park, “Document Image Binarization Based on Topographic Analysis Using a Water Fow Model” Proc. of Pattern Recognition, vol. 35, pp. 265–277, 2002.

    Google Scholar 

  15. Bolan Su, Shijian Lu, and Chew Lim Tan “Binarization of Historical Document Images Using the Local Maximum and Minimum” 9th IAPR International Workshop on Document Analysis Systems, pp. 159-166, 2010.

    Google Scholar 

  16. Baird, H.S.: Background structure in document images. In: Bunke, H. Wang, P., B aird, H.S. (eds.) Document Image Analysis. World Scientific, Singapore, pp. 17–34 (1994).

    Google Scholar 

  17. Breuel, T.M.: Two geometric algorithms for layout analysis. In: Proceedings of Document Analysis Systems. Lecture Notes in Computer Science, vol. 2423, Princeton, NY, USA, pp. 188–199 (2002).

    Google Scholar 

  18. O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162– 1173 (1993).

    Google Scholar 

  19. S. Mao and T. Kanungo, “Empirical Per formance Evaluation Methodology and Its Application to Page Segmentation Algorithms,” IEEE Trans. Pattern Analysis and M achi ne Intelligence, vol. 23, no. 3, pp. 242-256, Mar. 2001.

    Google Scholar 

  20. F. Shafait, D. Keysers, and T.M. Breuel, “Performance Evaluation and Benchmarking of Six Page Segmentation Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 941-954, June 2008.

    Google Scholar 

  21. F. Shafait, D. Keyser s, and T.M. B reuel, “Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images,” Proc. 18th Int’l Conf. Pattern Recognition, pp. 872-875, Aug. 2006.

    Google Scholar 

  22. N. Stamatopoulos, B.Gatos, and A. K esidis, “Automatic Borders Detection of Camera DocumentImages,” Proc. Second I nt’l Workshop Camera-Based Document Analys is and Recognition, pp. 71-78, Sept. 2007.

    Google Scholar 

  23. F. Shafait, J. van B euseko m, D. Keysers, and T.M.Breuel, “Do cumentCleanup Using Page Frame Detectio n,” Int’l J. Document Analysis and Recognition, vol. 11, no. 2, pp. 81-96, 2008.

    Google Scholar 

  24. F. Shafait, J. van B eusekom, D. K eysers, and T.M. B reuel, “Page Frame Detection for Marginal Noise Removal from S canned Documents,” Proc. Scandinavian Conf. I mage Analys is, pp. 651-660, June 2007.

    Google Scholar 

  25. Edwin H. Land, “The Retinex Theory of Color Vision,” Scientific American, Vol. 237, No. 6, pp. 108-128, 1977.

    Google Scholar 

  26. Kuo-Chin Fan, Yuan-Kai Wang, Tsann-Ran Lay, “Marginal Noise Removal of Document Images”, Pattern Recognition, 35(11), 2002, pp. 2593-2611.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marian Wagdy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Singapore

About this paper

Cite this paper

Wagdy, M., Faye, I., Rohaya, D. (2014). Border Noise Removal and Clean Up Based on Retinex Theory. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-981-4585-18-7_39

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-4585-17-0

  • Online ISBN: 978-981-4585-18-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics