Skip to main content
Log in

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Most image enhancement methods focused on restoration of digitized textual documents are limited to cases where the text information is still preserved in the input image, which may often not be the case. In this work, we propose a novel generative document restoration method which allows conditioning the restoration on a guiding signal in the form of target text transcription and which does not need paired high- and low-quality images for training. We introduce a neural network architecture with an implicit text-to-image alignment module. We demonstrate good results on inpainting, debinarization and deblurring tasks, and we show that the trained models can be used to manually alter text in document images. A user study shows that that human observers confuse the outputs of the proposed enhancement method with reference high-quality images in as many as 30% of cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The demonstration tool with the trained newspaper restoration and inpainting models along with image examples is publicly available at https://github.com/DCGM/pero-enhance. The repository also includes training scripts and links to training data.

References

  1. Bal, G., Agam, G., Frieder, O., Frieder, G.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu BA, Berkner K (eds) Document Recognition and Retrieval XV, SPIE. (2008). https://doi.org/10.1117/12.767203

  2. Chen, X., He, X., Yang, J., Wu, Q.: An effective document image deblurring algorithm. In: CVPR 2011. IEEE. (2011). https://doi.org/10.1109/cvpr.2011.5995568

  3. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). https://doi.org/10.1109/tpami.2015.2439281

    Article  Google Scholar 

  4. Fang, X., Zhou, Q., Shen, J., Jacquemin, C., Shao, L.: Text image deblurring using kernel sparsity prior. IEEE Trans. Cybern. 50(3), 997–1008 (2018). https://doi.org/10.1109/tcyb.2018.2876511

    Article  Google Scholar 

  5. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc., http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf (2014)

  6. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification. In: Proceedings of the 23rd international conference on Machine learning—ICML 2006. ACM Press (2006). https://doi.org/10.1145/1143844.1143891

  7. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017)

  8. He, S., Schomaker, L.: DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025

    Article  Google Scholar 

  9. Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association https://doi.org/10.5244/c.29.6

  10. Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

  11. Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.120

  12. Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.307

  13. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:1812.04948 (2018)

  14. Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015)

  15. Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00218

  16. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: DeblurGAN: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00854

  17. Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:1711.06106 (2017)

  18. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017). https://doi.org/10.1109/cvpr.2017.19

  19. Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. http://proceedings.mlr.press/v80/lehtinen18a.html (2018)

  20. Leung, C.C., Chan, K.S., Chan, H.M., Tsui, W.K.: A new approach for image enhancement applied to low-contrast-low-illumination IC and document images. Pattern Recogn. Lett. 26(6), 769–778 (2005). https://doi.org/10.1016/j.patrec.2004.09.032

    Article  Google Scholar 

  21. Liao, C.F., Tsao, Y., Lu, X., Kawai, H.: Incorporating symbolic sequential modeling for speech enhancement. Proc. Interspeech 2019, 2733–2737 (2019). https://doi.org/10.21437/Interspeech.2019-1777

    Article  Google Scholar 

  22. Lu, D., Huang, X., Sui, L.: Binarization of degraded document images based on contrast enhancement. Int. J. Document Anal. Recognit. (IJDAR) 21(1–2), 123–135 (2018). https://doi.org/10.1007/s10032-018-0299-9

    Article  Google Scholar 

  23. Madam, N.T., Kumar, S., Rajagopalan, A.N.: Unsupervised class-specific deblurring. In: Computer Vision—ECCV 2018, pp. 358–374. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01249-6_22

  24. Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00189

  25. Murray, R.L.: Toward a metadata standard for digitized historical newspapers. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries—JCDL 2005. ACM Press (2005). https://doi.org/10.1145/1065385.1065459

  26. Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016)

  27. Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring text images via l0-regularized intensity and gradient prior. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.371

  28. Pandey, R.K., Ramakrishnan, A.G.: Improving the perceptual quality of document images using deep neural network, In: Advances in Neural Networks—ISNN 2019. Springer International Publishing, pp. 448–459 (2019). https://doi.org/10.1007/978-3-030-22808-8_44

  29. Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images, pp. 123–130 (2013). https://doi.org/10.1145/2501115.2501130

  30. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)

  31. Ramakrishnan, S., Pachori, S., Gangopadhyay, A., Raman, S.: Deep generative filter for motion deblurring. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE (2017). https://doi.org/10.1109/iccvw.2017.353

  32. Ronneberger, O., PFischer, Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, vol 9351, pp. 234–241. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a, arXiv:1505.04597 [cs.CV] (2015)

  33. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  34. Sulaiman, Omar Nasrudin: Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 5(4), 48 (2019). https://doi.org/10.3390/jimaging5040048

    Article  Google Scholar 

  35. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf (2017)

  36. Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:1909.07113 (2019)

  37. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:1801.07892 (2018)

  38. Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017). https://doi.org/10.1109/tci.2016.2644865

    Article  Google Scholar 

  39. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Download references

Acknowledgements

This work has been supported by the Ministry of Culture Czech Republic in NAKI II project PERO (DG18P02OVV055) and by the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II), through the Project IT4Innovations Excellence in Science under Grant LQ1602. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA TITAN Xp GPU for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oldřich Kodym.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kodym, O., Hradiš, M. \(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality. IJDAR 25, 15–28 (2022). https://doi.org/10.1007/s10032-021-00387-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-021-00387-z

Keywords

Navigation