Abstract
Conversion from gray scale or color document image into binary image is the main step in most of Optical Character Recognition (OCR) systems and document analysis. After digitization, document images often suffer from poor contrast, noise, uniform lighting, and shadow. Also when a page of book is digitized using a scanner or a camera, a border noise, which is an unwanted text coming from the adjacent page, may appear. In this paper we present a simple and efficient document image clean up by border noise removal and enhancement based on retinex theory and global threshold. The proposed method produces high quality results compared to the previous works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y. Chen and G. Leedham, “Decompose Algorithm for Thresholding Degraded Historical Document Images” IEEE Proceedings on Vision, Image and Signal Processing, vol. 152 No.6, pp. 702–714, 2005.
G. Agam, G. Bal, G. Frieder, and O. Frieder, “Degraded Document Image Enhancement” in Document Recognition and Retrieval XIV, Proc. SPIE, vol. 6500, pp. 65000C-1 - 65000C-11, 2007.
J. M. White and G. D. Rohrer, “Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction” IBM Journal of Research and Development vol. 27, No. 4, pp. 400-411, 1983.
L. Gorman “Binarization and Multithresholding of Document Image Using Connectivity” CVGIP, Graph. Models Image Processing, vol. 56, No. 6, pp. 496-506, 1994.
R. Cattoni, T. Coianiz, S. Messelodi, and CM Modena, “Geometric Layout Analysis Techniques for Document Image Understanding: a Review”, ITC-irst Technical Report 9703 (09), 1998.
P. Viola and M. J. Jones, “Robust Real-Time Face Detection,” Int. Journal of Computer Vision, vol. 57, No. 2, pp. 137– 154, 2004.
F. Shafait, D. Keysers, and T. M. Breuel, “Performance Comparison of Six Algorithms for Page Segmentation,” in 7th IAPR Workshop on Document Analysis Systems, pp. 368–379, 2006.
N. Otsu, “A Threshold Selection Method FromGray-Level Histograms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, No. 1, pp. 62–66, 1979.
Y. Solihin, and C. G. Leedham, “Integral Ratio: A New Class of Global Thresholding Techniques for Handwriting Images”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, No. 8, pp. 761 – 768, 1999.
W. Niblack “An Introduction to Digital Image Processing” Prentice-Hall, Englewood Cliffs, New Jersey, 1986.
J. Sauvola and M. Pietikainen, “Adaptive Document Image Binarization,” Proc. of Pattern Recognition, vol. 33, No. 2, pp. 225–236, 2000.
T.Romen “A New Local Adaptive Thresholding Technique in Binarization” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No. 2, pp. 271-277,2011.
J. G. Kuk, and N. I. Cho, “Feature Based Binarization of Document Images Degraded by Uneven Light Condition” in 10th inter. Conf. On Document Analysis and Recognition (ICDAR), pp. 748-752, 2009.
I. K. Kim, D. W. Jung, and R. H. Park, “Document Image Binarization Based on Topographic Analysis Using a Water Fow Model” Proc. of Pattern Recognition, vol. 35, pp. 265–277, 2002.
Bolan Su, Shijian Lu, and Chew Lim Tan “Binarization of Historical Document Images Using the Local Maximum and Minimum” 9th IAPR International Workshop on Document Analysis Systems, pp. 159-166, 2010.
Baird, H.S.: Background structure in document images. In: Bunke, H. Wang, P., B aird, H.S. (eds.) Document Image Analysis. World Scientific, Singapore, pp. 17–34 (1994).
Breuel, T.M.: Two geometric algorithms for layout analysis. In: Proceedings of Document Analysis Systems. Lecture Notes in Computer Science, vol. 2423, Princeton, NY, USA, pp. 188–199 (2002).
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162– 1173 (1993).
S. Mao and T. Kanungo, “Empirical Per formance Evaluation Methodology and Its Application to Page Segmentation Algorithms,” IEEE Trans. Pattern Analysis and M achi ne Intelligence, vol. 23, no. 3, pp. 242-256, Mar. 2001.
F. Shafait, D. Keysers, and T.M. Breuel, “Performance Evaluation and Benchmarking of Six Page Segmentation Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 941-954, June 2008.
F. Shafait, D. Keyser s, and T.M. B reuel, “Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images,” Proc. 18th Int’l Conf. Pattern Recognition, pp. 872-875, Aug. 2006.
N. Stamatopoulos, B.Gatos, and A. K esidis, “Automatic Borders Detection of Camera DocumentImages,” Proc. Second I nt’l Workshop Camera-Based Document Analys is and Recognition, pp. 71-78, Sept. 2007.
F. Shafait, J. van B euseko m, D. Keysers, and T.M.Breuel, “Do cumentCleanup Using Page Frame Detectio n,” Int’l J. Document Analysis and Recognition, vol. 11, no. 2, pp. 81-96, 2008.
F. Shafait, J. van B eusekom, D. K eysers, and T.M. B reuel, “Page Frame Detection for Marginal Noise Removal from S canned Documents,” Proc. Scandinavian Conf. I mage Analys is, pp. 651-660, June 2007.
Edwin H. Land, “The Retinex Theory of Color Vision,” Scientific American, Vol. 237, No. 6, pp. 108-128, 1977.
Kuo-Chin Fan, Yuan-Kai Wang, Tsann-Ran Lay, “Marginal Noise Removal of Document Images”, Pattern Recognition, 35(11), 2002, pp. 2593-2611.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Singapore
About this paper
Cite this paper
Wagdy, M., Faye, I., Rohaya, D. (2014). Border Noise Removal and Clean Up Based on Retinex Theory. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_39
Download citation
DOI: https://doi.org/10.1007/978-981-4585-18-7_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-4585-17-0
Online ISBN: 978-981-4585-18-7
eBook Packages: EngineeringEngineering (R0)