Skip to main content
Log in

Clutter noise removal in binary document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

The paper presents a clutter detection and removal algorithm for complex document images. This distance transform based technique aims to remove irregular and independent unwanted clutter while preserving the text content. The novelty of this approach is in its approximation to the clutter–content boundary when the clutter is attached to the content in irregular ways. As an intermediate step, a residual image is created, which forms the basis for clutter detection and removal. Clutter detection and removal are independent of clutter’s position, size, shape, and connectivity with text. The method is tested on a collection of highly degraded and noisy, machine-printed and handwritten Arabic and English documents, and results show pixel-level accuracies of 99.18 and 98.67 % for clutter detection and removal, respectively. This approach is also extended to documents having a mix of clutter and salt-and-pepper noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Agrawal, M., Doermann, D.: Stroke-like pattern noise removal in binary document images. In: International Conference on Document Analysis and Recognition (ICDAR’11), pp. 17–21 (2011)

  2. Ali, M.: Background noise detection and cleaning in document images. Proceedings of 13th International Conference on Pattern Recognition (ICPR’96), vol. 3, pp. 758–762 (1996)

  3. Ávila, B.T., Lins, R.D.: A new algorithm for removing noisy borders from monochromatic documents. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC ’04) New York, pp. 1219–1225 (2004)

  4. Baird, H.S.: The state of the art of document image degradation modeling. In: Proceedings of 4th IAPR International Workshop on Document Analysis Systems, pp. 1–16 (2000)

  5. Bardsley, J., Jefferies, S., Nagy, J., Plemmons, R.: A computational method for the restoration of images with an unknown, spatially-varying blur. Opt. Express 14(5), 1767–1782 (2006)

    Article  Google Scholar 

  6. Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. (CVGIP’86), 34(3), 344–371 (1986)

  7. Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  8. Chinnasarn, K., Rangsanseri, Y., Thitimajshima, P.: Removing salt-and-pepper noise in text/graphics images. Asia-Pacific Conference on Circuits and Systems (IEEE APCCAS), pp. 459–462 (1998)

  9. Chowdhury, S.P., Mandal, S., Das, A.K., Chanda, B.: Automated segmentation of math-zones from document images. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03), IEEE Computer Society, Washington, p. 755 (2003)

  10. Fan, K.C., Wang, Y.K., Lay, T.R.: Marginal noise removal of document images. In: Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 317–321 (2001)

  11. Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Proceedings of SPIE—Document Recognition IV, pp. 779–799. World Scientific (1997)

  12. Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.: Handwritten arabic text line segmentation using affinity propagation. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS’10), DAS ’10, pp. 135–142 (2010)

  13. Le, D.X., Thoma, G.R., Wechsler, H.: Automated borders detection and adaptive segmentation for binary document images. In: Proceedings of the International Conference on Pattern Recognition (ICPR ’96), ICPR ’96, vol. 3, pp. 737. IEEE Computer Society, Washington (1996)

  14. Liang, S., Ahmadi, M., Shridhar, M.: A morphological approach to text string extraction from regular periodic overlapping text/background images. In: Proceedings of IEEE International Conference on Image Processing (ICIP’94)1, vol. 1, pp. 144–148 (1994)

  15. Liu, Y., Srihari, S.: Document image binarization based on texture features. IEEE Trans. Pattern Analy. Mach. Intell. (PAMI) 19(5), 540–544 (1997)

    Article  Google Scholar 

  16. Negishi, H., Kato, J., Hase, H., Watanabe, T.: Character extraction from noisy background for an automatic reference system. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR’99) pp. 143–146 (1999)

  17. Ozawa, H., Nakagawa, T.: A character image enhancement method from characters with various background images. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93), pp. 58–61 (1993)

  18. Pham, T.D.: Unconstrained logo detection in document images. Pattern Recognit. 36(12), 3023–3025 (2003)

    Article  MATH  Google Scholar 

  19. Rosenfeld, A., Pfaltz, J.L.: Sequential operations in digital picture processing. J. Assoc. Comp. Mach. 13(4), 471–494 (1966)

    Article  MATH  Google Scholar 

  20. Rosenfeld, A., Pfaltz, J.L.: Distance functions on digital pictures. Pattern Recognit. 1(1), 33–61 (1968)

    Article  MathSciNet  Google Scholar 

  21. Shafait, F., Breuel, T.: A simple and effective approach for border noise removal from document images. In: IEEE 13th International Multitopic Conference (INMIC’09), pp. 1–5 (2009)

  22. Stamatopoulos, N., Gatos, B., Georgiou, T.: Automatic borders detection of camera document images. In: Proceedings of 2nd International Workshop Camera-Based Document Analysis and Recognition (CBDAR’07), pp. 71–78 (2007)

  23. Stamatopoulos, N., Gatos, B., Georgiou, T.: Page frame detection for double page document images. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS’10), DAS ’10, pp. 401–408. ACM (2010)

  24. Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text extraction in complex color documents. Pattern Recognit. 35(8), 1743–1758 (2002)

    Article  MATH  Google Scholar 

  25. Strouthopoulos, C., Papamarkos, N., Chamzas, C.: Identification of text-only areas in mixed-type documents. Eng. Appl. Artif. Intell. 10(4), 387–401 (1997)

    Article  Google Scholar 

  26. Wang, Q., Tan, C.L.: Matching of double-sided document images to remove interference. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) I-1084-I-1089 vol. 1 (2001)

  27. Windyga, P.S.: Fast impulsive noise removal. IEEE Trans. Image Process. 10, 173–179 (2001)

    Article  Google Scholar 

  28. Wu, V., Manmatha, R., Riseman Sr, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 21(11), 1224–1229 (1999)

    Article  Google Scholar 

  29. Yuan, Q., Tan, C.: Text extraction from gray scale document images using edge information. In: Proceedings of 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 302–306 (2001)

  30. Zheng, Y., Li, H., Doermann, D.: A model-based line detection algorithm in documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR’03) vol. 1, pp. 44–48 (2003)

  31. Zheng, Y., Liu, C., Ding, X., Pan, S.: Form frame line detection with directional single-connected chain. In: Proceedings of 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 699–703 (2001)

  32. Zhu, G., Jaeger, S., Doermann, D.: A robust stamp detection framework on degraded documents. In: International Conference on Document Recognition and Retrieval XIII, pp. 1–9. San Jose (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mudit Agrawal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrawal, M., Doermann, D. Clutter noise removal in binary document images. IJDAR 16, 351–369 (2013). https://doi.org/10.1007/s10032-012-0196-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-012-0196-6

Keywords

Navigation