Skip to main content

Advertisement

Log in

Low quality document image modeling and enhancement

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In order to tackle problems such as shadow- through and bleed-through, a novel defect model is developed which generates physically damaged document images. This model addresses physical degradation, such as aging and ink seepage. Based on the diffusive nature of the physical defects, the model is designed using virtual diffusion processes. Then, based on this degradation model, a restoration method is proposed and used to fix the bleed-through effect in double-sided document images using the reverse diffusion process. Subjective and objective evaluations are performed on both the degradation model and the restoration method. The experiments show promising results on both real and generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Khatib, W.G., Shahab, S., Mahmoud, S.A.: Digital library framework for arabic manuscripts. In: Shahab, S. (ed.) Computer Systems and Applications, 2007. AICCSA ’07. IEEE/ACS International Conference on, pp. 458–465. Amman, Jordan (2007). doi:10.1109/AICCSA.2007.370922

  2. Baird, H.: Document image defect models. In: Proceedings of IAPR Workshop Synthetic and Structural Pattern Recognition. Murray Hill, NJ (1990). (Reprinted in L. O’Gorman & R. Kasturi (eds.), Document Image Analysis. IEEE Computer Society Press, Washington, pp. 315–325, 1995

  3. Baird, H.: Digital Document Processing: Major Directions and Recent Advances. The State of the Art of Document Image Degradation Modelling, pp. 261–279. Springer, Berlin (2007). doi:10.1007/978-1-84628-726-8_12

  4. Barney Smith, E.H.: Estimating scanning characteristics from corners in bilevel images. In: Proceedings of SPIE, vol. 4307. Document Recognition and Retrieval VIII, pp. 176–183. San Jose (2001). doi:10.1117/12.410835

  5. Boussellaa, W., Zahour, A., Taconet, B., Alimi, A., Benabdelhafid, A.: Praad: Preprocessing and analysis tool for arabic ancient documents. In: Zahour, A (ed.) Document Analysis and Recognition, 2007. ICDAR 2007 Vol. 2. Ninth International Conference on, vol. 2, pp. 1058–1062. Curitiba, Parana (2007). doi:10.1109/ICDAR.2007.4377077

  6. Castro, P., Almeida, R., Pinto, J.: Restoration of Double-Sided Ancient Music Documents with Bleed-Through, vol. 4756/2008, pp. 940–949. Springer, Berlin (2007). doi:10.1007/978-3-540-76725-1

  7. Chen, L., Zhu, J., Young, M., Susfalk, R.: Modeling polyacrylamide transport in water delivery canals. In: ASA-CSSA-SSSA International Annual Meetings, pp. 294–296. Indianapolis (2006)

  8. Cichocki, A., S, A., K, S., Tanaka, T., Phan, A.H., Zdunek, R.: Icalab—matlab toolbox ver. 3 for signal processing (2007)

  9. Deriche, R., Faugeras, O.: Les edp en traitement des images et vision par ordinateur. Tech. Rep. 2697, INRIA (1996)

  10. Drira, F.: Contribution à la restauration des images de documents anciens. Ph.D. thesis, École Doctorale Informatique et Information pour la Société (EDIIS), LIRIS, UMR 5205 CNRS (2007)

  11. Dubois, E., Dano, P.: Joint compression and restoration of documents with bleed-through. In: Proceedings of IS&T Archiving 2005, pp. 170–174. Washington DC, USA (2005)

  12. Dubois, E., Pathak, A.: Reduction of bleed-through in scanned manuscript documents. In: Proceedings of IS&T Image Processing, Image Quality, Image Capture Systems Conference (PICS2001), pp. 177–180. Montreal, Canada (2001)

  13. Fathalla, R., Sonbaty, Y.E., Ismail, M.: Extraction of arabic words from complex color image. In: Sonbaty, Y.E. (ed.) Document Analysis and Recognition, 2007. ICDAR 2007 Vol. 2. Ninth International Conference on, vol. 2, pp. 1223–1227. Curitiba, Parana (2007). doi:10.1109/ICDAR.2007.4377110

  14. Google: Book Search Dataset, version V edn. (2007)

  15. Kanungo, T., Haralick, R., Baird, H., Stuetzle, W., Madigan, D.: Document degradation models: parameter estimation and model validation. In: Proceedings of International Workshop on Machine Vision Applications, pp. 552–557. Kawasaki, Japan (1994)

  16. Kanungo T., Haralick R., Baird H., Stuezle W., Madigan D.: A statistical, nonparametric methodology for document degradation model validation. Trans. Pattern Anal. Machine Intell. 22(11), 1209–1223 (2000) doi:10.1109/34.888707

    Article  Google Scholar 

  17. Kanungo T., Haralick R.M., Phillips I.: Nonlinear local and global document degradation models. Int. J. Imaging Syst. Technol. 5, 220–230 (1994) doi:10.1002/ima.1850050305

    Article  Google Scholar 

  18. Kanungo T., Zheng Q.: Estimating degradation model parameters using neighborhood pattern distributions: an optimization approach. Trans. Pattern Anal. Machine Intell. 26(4), 520–524 (2004) doi:10.1109/TPAMI.2004.1265867

    Article  Google Scholar 

  19. Kim, B.J., Pearlman, W.: An embedded wavelet video coder using three-dimensional set partitioning in hierarchical trees (spiht). In: Data Compression Conference, 1997. DCC ’97. Proceedings, pp. 251–260. Snowbird, USA (1997). doi:10.1109/DCC.1997.582048

  20. Klijn, E., Bibliotheek, K.: The current state-of-art in newspaper digitization: a market perspective. D-Lib Mag. (2008). doi:10.1045/january2008-klijn

  21. Leedham, G., Varma, S., Patankar, A., Govindaraju, V.: Separating text and background in degraded document images—a comparison of global thresholding techniques for multi-stage thresholding. In: Proceedings of Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 244–249 (2002). doi:10.1109/IWFHR.2002.1030917

  22. Lesk, M.: Substituting images for books: the economics for libraries. In: Symposium Document Analysis and Information Retieval, pp. 1–6. Las Vegas, Nevada (1996)

  23. Monteil J., Beghdadi A.: A new interpretation and improvement of the nonlinear anisotropic diffusion for image enhancement. IEEE Trans. Pattern Anal. Machine Intell. 21(9), 940–946 (1999) doi:10.1109/34.790435

    Article  Google Scholar 

  24. Nishida, H., Suzuki, T.: Correcting show-through effects on document images by multiscale analysis. In: Suzuki, T. (ed.) Pattern Recognition, 2002. Proceedings of 16th International Conference on, vol. 3, pp. 65–68 (2002)

  25. Nordström, N.: Biased anisotropic diffusion–unified regularization and diffusion approach to edge detection. Computer Vision – ECCV 90 pp. 18–27 (1990). doi:10.1007/BFb0014846

  26. Oja E., Yuan Z.: The fastica algorithm revisited: Convergence analysis. IEEE Trans. Neural Netw. 17(6), 1370–1381 (2006) doi:10.1109/TNN.2006.880980

    Article  Google Scholar 

  27. Perona P., Malik J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990) doi:10.1109/34.56205

    Article  Google Scholar 

  28. Rice, S., Jenkins, F., Nartker, T.: The fifth test of ocr accuracy. Tech. Rep. TR-96-01, ISRI, Univ. Nevada Las Vegas, Las Vegas (1996)

  29. Rice, S., Kanai, J., Nartker, T.: A report on the accuracy of ocr devices. Tech. Rep. TR-92-02, Univ. Nevada Las Vegas, Las Vegas (1992)

  30. Roth K.: Scaling of water flow through porous media and soils. Eur. J. Soil Sci. 59(1), 125–130 (2008) doi:10.1111/j.1365-2389.2007.00986.x

    Google Scholar 

  31. Rudin L.I., Osher S., Fatemi E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992) doi:10.1016/0167-2789(92)90242-F

    Article  MATH  Google Scholar 

  32. Salerno E., Tonazzini A., Bedini L.: Digital image analysis to enhance underwritten text in the archimedes palimpsest. Int. J. Doc. Analy. Recogn. 9(2), 79–87 (2007) doi:10.1007/s10032-006-0028-7

    Article  Google Scholar 

  33. Sharma G.: Show-through cancellation in scans of duplex printed documents. IEEE Trans. Image Process. 10(5), 736–754 (2001) doi:10.1109/83.918567

    Article  Google Scholar 

  34. da Silva J.M.M., Lins R.D., Martins F.M.J., Wachenchauzer R.: A new and efficient algorithm to binarize document images removing back-to-front interference. J. Univ. Comput. Sci. 14(2), 299–313 (2008)

    Google Scholar 

  35. Tan C.L., Cao R., Shen P.: Restoration of archival documents using a wavelet technique. IEEE Trans. Pattern Anal. Machine Intell. 24(10), 1399–1404 (2002) doi:10.1109/TPAMI.2002.1039211

    Article  Google Scholar 

  36. Tan, C.L., Cao, R., Shen, P., Wang, Q., Chee, J., Chang, J.: Removal of interfering strokes in double-sided document images. In: Applications of Computer Vision, 2000, Fifth IEEE Workshop on., pp. 16–21. Palm Springs, USA (2000). doi:10.1109/WACV.2000.895397

  37. The Mathworks Inc., Natick: MATLAB Version 7.5.0

  38. Tonazzini A., Salerno E., Bedini L.: Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. Int. J. Doc. Anal. Recogn. 10(1), 17–25 (2007) doi:10.1007/s10032-006-0015-z

    Article  Google Scholar 

  39. Tonazzini, A., Salerno, E., Mochi, M., Bedini, L.: Blind source separation techniques for detecting hidden texts and textures in document images. Image Anal. Recogn. 241–248 (2004). doi:10.1007/b100438

  40. Vaziri H.H., Xiao Y., Islam R., Nouri A.: Numerical modeling of seepage-induced sand production in oil and gas reservoirs. J. Petrol. Sci. Eng. 36(1-2), 71–86 (2002) doi:10.1016/S0920-4105(02)00264-4

    Article  Google Scholar 

  41. Voci F., Eiho S., Sugimoto N., Sekibuchi H.: Estimating the gradient in the Perona-Malik equation. Signal Process. Mag. IEEE 21(3), 39–65 (2004) doi:10.1109/MSP.2004.1296541

    Article  Google Scholar 

  42. Wang, X., Sun, J.: The researching about water and ink motion model based on soil-water dynamics in simulating for the chinese painting. In: Sun, J. (ed.) Image and Graphics, 2007. ICIG 2007. Fourth International Conference on, pp. 880–885 (2007). doi:10.1109/ICIG.2007.179

  43. Yam, H.S., Barney Smith, E.: Estimating degradation model parameters from character images. In: Barney Smith, E. (ed.) Document Analysis and Recognition, 2003. Proceedings of Seventh International Conference on, vol. 2, pp. 710–714. Edinburgh, Scotland (2003)

  44. Zhang, D.: Stochastic Methods for Flow in Porous Media: Coping with Uncertainties. ISBN 0-12-779621-5. Academic Press, San Diego (2002)

  45. Zhang, X., Lu, J., Yahagi, T.: Blind separation methods for image show-through problem. In: Lu, J. (ed.) Information Technology Applications in Biomedicine, 2007. ITAB 2007. Sixth International Special Topic Conference on, pp. 255–258 (2007). doi:10.1109/ITAB.2007.4407395

  46. Zi, G.: Groundtruth generation and document image degradation. Technol. Rep. LAMP-TR-121/CAR-TR-1008/CS-TR-4699/UMIACS-TR-2005-08, University of Maryland, College Park (2005). http://lampsrv01.umiacs.umd.edu/pubs/TechReports

  47. Zi, G., Doermann, D.: Document image ground truth generation from electronic text. In: Doermann, D. (ed.) Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp. 663–666 (2004). doi:10.1109/ICPR.2004.1334346

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Farrahi Moghaddam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moghaddam, R.F., Cheriet, M. Low quality document image modeling and enhancement. IJDAR 11, 183–201 (2009). https://doi.org/10.1007/s10032-008-0076-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-008-0076-2

Keywords

Navigation