$$\hbox {TG}^2$$ : text-guided transformer GAN for restoring document readability and perceived quality

Kodym, Oldřich; Hradiš, Michal

doi:10.1007/s10032-021-00387-z

$\hbox {TG}^2$: text-guided transformer GAN for restoring document readability and perceived quality

Original Paper
Published: 22 September 2021

Volume 25, pages 15–28, (2022)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

531 Accesses
3 Citations
Explore all metrics

Abstract

Most image enhancement methods focused on restoration of digitized textual documents are limited to cases where the text information is still preserved in the input image, which may often not be the case. In this work, we propose a novel generative document restoration method which allows conditioning the restoration on a guiding signal in the form of target text transcription and which does not need paired high- and low-quality images for training. We introduce a neural network architecture with an implicit text-to-image alignment module. We demonstrate good results on inpainting, debinarization and deblurring tasks, and we show that the trained models can be used to manually alter text in document images. A user study shows that that human observers confuse the outputs of the proposed enhancement method with reference high-quality images in as many as 30% of cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 12

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Article Open access 22 November 2021

Thomas Hegghammer

Image Generation: A Review

Article 11 March 2022

Mohamed Elasri, Omar Elharrouss, … Hamid Tairi

Notes

The demonstration tool with the trained newspaper restoration and inpainting models along with image examples is publicly available at https://github.com/DCGM/pero-enhance. The repository also includes training scripts and links to training data.

References

Bal, G., Agam, G., Frieder, O., Frieder, G.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu BA, Berkner K (eds) Document Recognition and Retrieval XV, SPIE. (2008). https://doi.org/10.1117/12.767203
Chen, X., He, X., Yang, J., Wu, Q.: An effective document image deblurring algorithm. In: CVPR 2011. IEEE. (2011). https://doi.org/10.1109/cvpr.2011.5995568
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). https://doi.org/10.1109/tpami.2015.2439281
Article Google Scholar
Fang, X., Zhou, Q., Shen, J., Jacquemin, C., Shao, L.: Text image deblurring using kernel sparsity prior. IEEE Trans. Cybern. 50(3), 997–1008 (2018). https://doi.org/10.1109/tcyb.2018.2876511
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc., http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf (2014)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification. In: Proceedings of the 23rd international conference on Machine learning—ICML 2006. ACM Press (2006). https://doi.org/10.1145/1143844.1143891
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017)
He, S., Schomaker, L.: DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025
Article Google Scholar
Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association https://doi.org/10.5244/c.29.6
Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.120
Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.307
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:1812.04948 (2018)
Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015)
Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00218
Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: DeblurGAN: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00854
Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:1711.06106 (2017)
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017). https://doi.org/10.1109/cvpr.2017.19
Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. http://proceedings.mlr.press/v80/lehtinen18a.html (2018)
Leung, C.C., Chan, K.S., Chan, H.M., Tsui, W.K.: A new approach for image enhancement applied to low-contrast-low-illumination IC and document images. Pattern Recogn. Lett. 26(6), 769–778 (2005). https://doi.org/10.1016/j.patrec.2004.09.032
Article Google Scholar
Liao, C.F., Tsao, Y., Lu, X., Kawai, H.: Incorporating symbolic sequential modeling for speech enhancement. Proc. Interspeech 2019, 2733–2737 (2019). https://doi.org/10.21437/Interspeech.2019-1777
Article Google Scholar
Lu, D., Huang, X., Sui, L.: Binarization of degraded document images based on contrast enhancement. Int. J. Document Anal. Recognit. (IJDAR) 21(1–2), 123–135 (2018). https://doi.org/10.1007/s10032-018-0299-9
Article Google Scholar
Madam, N.T., Kumar, S., Rajagopalan, A.N.: Unsupervised class-specific deblurring. In: Computer Vision—ECCV 2018, pp. 358–374. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01249-6_22
Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00189
Murray, R.L.: Toward a metadata standard for digitized historical newspapers. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries—JCDL 2005. ACM Press (2005). https://doi.org/10.1145/1065385.1065459
Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016)
Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring text images via l0-regularized intensity and gradient prior. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.371
Pandey, R.K., Ramakrishnan, A.G.: Improving the perceptual quality of document images using deep neural network, In: Advances in Neural Networks—ISNN 2019. Springer International Publishing, pp. 448–459 (2019). https://doi.org/10.1007/978-3-030-22808-8_44
Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images, pp. 123–130 (2013). https://doi.org/10.1145/2501115.2501130
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)
Ramakrishnan, S., Pachori, S., Gangopadhyay, A., Raman, S.: Deep generative filter for motion deblurring. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE (2017). https://doi.org/10.1109/iccvw.2017.353
Ronneberger, O., PFischer, Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, vol 9351, pp. 234–241. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a, arXiv:1505.04597 [cs.CV] (2015)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Sulaiman, Omar Nasrudin: Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 5(4), 48 (2019). https://doi.org/10.3390/jimaging5040048
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf (2017)
Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:1909.07113 (2019)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:1801.07892 (2018)
Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017). https://doi.org/10.1109/tci.2016.2644865
Article Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Download references

Acknowledgements

This work has been supported by the Ministry of Culture Czech Republic in NAKI II project PERO (DG18P02OVV055) and by the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II), through the Project IT4Innovations Excellence in Science under Grant LQ1602. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA TITAN Xp GPU for this research.

Author information

Authors and Affiliations

Department of Computer Graphics and Multimedia, Brno University of Technology, Božetěchova 2, 612 66, Brno, Czech Republic
Oldřich Kodym & Michal Hradiš

Authors

Oldřich Kodym
View author publications
You can also search for this author in PubMed Google Scholar
Michal Hradiš
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oldřich Kodym.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kodym, O., Hradiš, M. $\hbox {TG}^2$: text-guided transformer GAN for restoring document readability and perceived quality. IJDAR 25, 15–28 (2022). https://doi.org/10.1007/s10032-021-00387-z

Download citation

Received: 04 August 2020
Revised: 12 April 2021
Accepted: 08 September 2021
Published: 22 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10032-021-00387-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Image Generation: A Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Image Generation: A Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation