Skip to main content

Building Super-Resolution Image Generator for OCR Accuracy Improvement

  • Conference paper
  • First Online:
Book cover Document Analysis Systems (DAS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

Abstract

Super-resolving a low resolution (LR) document image can not only enhance the visual quality and readability of the text, but improve the optical character recognition (OCR) accuracy. However, even despite the ill-posed nature of image super-resolution (SR) problem, how do we treat the finer details of text with large upscale factors and suppress noises and artifacts at the same time, especially for low quality document images is still a challenging task. Thus, in order to boost the OCR accuracy, we propose a generative adversarial network (GAN) based framework in this paper, where a SR image generator and a document image quality discriminator are constructed. To obtain high quality SR document image, multiple losses are designed to encourage the generator to learn the structural properties of texts. Meanwhile, the quality discriminator is trained based on a relativistic loss function. Based on the proposed framework, the obtained SR document images not only maintain the details of textures but remove the background noises, which achieve better OCR performance on the public databases. The source codes and pre-trained models are available at https://gitlab.com/xujun.peng/doc-super-resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, M., Doermann, D.: Stroke-like pattern noise removal in binary document images. In: 2011 International Conference on Document Analysis and Recognition, pp. 17–21 (2011)

    Google Scholar 

  2. Anwar, S., Khan, S., Barnes, N.: A deep journey into super-resolution: a survey. arXiv preprint arXiv:1904.07523 (2019)

  3. Caner, G., Haritaoglu, I.: Shape-DNA: effective character restoration and enhancement for Arabic text documents. In: 2010 20th International Conference on Pattern Recognition, pp. 2053–2056 (2010)

    Google Scholar 

  4. Cao, H., Natarajan, P., Peng, X., Subramanian, K., Belanger, D., Li, N.: Progress in the Raytheon BBN Arabic offline handwriting recognition system. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 555–560 (2014)

    Google Scholar 

  5. Decerbo, M., Natarajan, P., Prasad, R., MacRostie, E., Ravindran, A.: Performance improvements to the BBN Byblos OCR system. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), vol. 1, pp. 411–415 (2005)

    Google Scholar 

  6. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)

    Article  Google Scholar 

  7. Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. CoRR abs/1506.02211 (2015). http://arxiv.org/abs/1506.02211

  8. Fawzi, M., et al.: Rectification of camera captured document images for camera-based OCR technology. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1226–1230 (2015)

    Google Scholar 

  9. Fu, Z., et al.: Cascaded detail-preserving networks for super-resolution of document images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 240–245 (2019)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  11. Jean-Caurant, A., Tamani, N., Courboulay, V., Burie, J.: Lexicographical-based order for post-OCR correction of named entities. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1192–1197 (2017)

    Google Scholar 

  12. Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1erHoR5t7

  13. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)

    Google Scholar 

  14. Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1352–1357 (2019)

    Google Scholar 

  15. Kumar, J., Ye, P., Doermann, D.: A dataset for quality assessment of camera captured document images. In: Camera-Based Document Analysis and Recognition, pp. 113–125 (2014)

    Google Scholar 

  16. Lat, A., Jawahar, C.V.: Enhancing OCR accuracy with super resolution. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3162–3167 (2018)

    Google Scholar 

  17. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017)

    Google Scholar 

  18. Lu, J., Min, D., Pahwa, R.S., Do, M.N.: A revisit to MRF-based depth map super-resolution and enhancement. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 985–988 (2011)

    Google Scholar 

  19. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821 (2017)

    Google Scholar 

  20. Mokhtar, K., Bukhari, S.S., Dengel, A.: OCR error correction: state-of-the-art vs an NMT-based approach. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 429–434 (2018)

    Google Scholar 

  21. Nakao, R., Iwana, B.K., Uchida, S.: Selective super-resolution for scene text images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 401–406 (2019)

    Google Scholar 

  22. Nayef, N., Chazalon, J., Gomez-Krämer, P., Ogier, J.: Efficient example-based super-resolution of single text images based on selective patch processing. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 227–231 (2014)

    Google Scholar 

  23. Nayef, N., Luqman, M.M., Prum, S., Eskenazi, S., Chazalon, J., Ogier, J.: SmartDoc-QA: a dataset for quality assessment of smartphone captured document images - single and multiple distortions. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1231–1235 (2015)

    Google Scholar 

  24. Nguyen, K.C., Nguyen, C.T., Hotta, S., Nakagawa, M.: A character attention generative adversarial network for degraded historical document restoration. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 420–425 (2019)

    Google Scholar 

  25. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)

    Google Scholar 

  26. Ohkura, A., Deguchi, D., Takahashi, T., Ide, I., Murase, H.: Low-resolution character recognition by video-based super-resolution. In: 10th International Conference on Document Analysis and Recognition, pp. 191–195 (2009)

    Google Scholar 

  27. Peng, X., Cao, H., Natarajan, P.: Boost OCR accuracy using iVector based system combination approach. In: Document Recognition and Retrieval XXII, vol. 9402, pp. 116–123 (2015)

    Google Scholar 

  28. Peyrard, C., Baccouche, M., Mamalet, F., Garcia, C.: ICDAR2015 competition on text image super-resolution. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1201–1205 (2015)

    Google Scholar 

  29. Rawls, S., Cao, H., Kumar, S., Natarajan, P.: Combining convolutional neural networks and LSTMS for segmentation-free OCR. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 155–160 (2017)

    Google Scholar 

  30. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  31. Sharma, M., Ray, A., Chaudhury, S., Lall, B.: A noise-resilient super-resolution framework to boost OCR performance. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 466–471 (2017)

    Google Scholar 

  32. Sharma, M., et al.: An end-to-end trainable framework for joint optimization of document enhancement and recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 59–64 (2019)

    Google Scholar 

  33. Smith, R., Antonova, D., Lee, D.S.: Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the International Workshop on Multilingual OCR, pp. 1:1–1:8 (2009)

    Google Scholar 

  34. Stamatopoulos, N., Gatos, B., Pratikakis, I., Perantonis, S.J.: A two-step dewarping of camera document images. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 209–216 (2008)

    Google Scholar 

  35. Su, X., Xu, H., Kang, Y., Hao, X., Gao, G., Zhang, Y.: Improving text image resolution using a deep generative adversarial network for optical character recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1193–1199 (2019)

    Google Scholar 

  36. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008 (2017)

    Google Scholar 

  37. Walha, R., Drira, F., Lebourgeois, F., Garcia, C., Alimi, A.M.: Handling noise in textual image resolution enhancement using online and offline learned dictionaries. Int. J. Doc. Anal. Recognit. (IJDAR) 21(1), 137–157 (2018)

    Article  Google Scholar 

  38. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, pp. 1398–1402 (2003)

    Google Scholar 

  39. Xu, S., Smith, D.: Retrieving and combining repeated passages to improve OCR. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–4 (2017)

    Google Scholar 

  40. Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.H.: Deep learning for single image super-resolution: a brief review. arxiv abs/1808.03344 (2018)

    Google Scholar 

  41. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xujun Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, X., Wang, C. (2020). Building Super-Resolution Image Generator for OCR Accuracy Improvement. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57058-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57057-6

  • Online ISBN: 978-3-030-57058-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics