Skip to main content

TBM-GAN: Synthetic Document Generation with Degraded Background

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14188))

Included in the following conference series:

  • 1031 Accesses

Abstract

Deep document enhancement models often suffer in real world applications due to limited annotation and bias in training data. Moreover, generative models are often prone to spectral bias towards certain frequencies. The background (noisy) texture is usually harder to learn as it is composed from different frequency regions. In this work, we propose TBM-GAN, a generative adversarial network based framework to synthesise realistic handwritten documents with degraded background. In addition to the spatial information, TBM-GAN also incorporates the frequency information in its loss function to focus on complex noisy texture. Overall, we develop an automated pipeline for TBM-GAN and train it with artificially annotated data from publicly available resources. The pipeline provides both text-label and corresponding pixel-level annotation. We evaluate the quality of synthetic images in the downstream task of OCR. In text images with historical noisy background, we observe an \(11\%\) reduction in the character error rate when the OCR is trained with synthetic data from TBM-GAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alberti, M., Seuret, M., Ingold, R., Liwicki, M.: A pitfall of unsupervised pre-training. arXiv preprint arXiv:1703.04332 (2017)

  2. Baird, H.S., Bunke, H., Yamamoto, K.: Structured Document Image Analysis. Springer, Science & Business Media (2012). https://doi.org/10.1007/978-3-642-77281-8

  3. Bhunia, A.K., Bhunia, A.K., Sain, A., Roy, P.P.: Improving document binarization via adversarial noise-texture augmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2721–2725. IEEE (2019)

    Google Scholar 

  4. Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F.S., Shah, M.: Handwriting transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1086–1094 (2021)

    Google Scholar 

  5. Biswas, S., Riba, P., Lladós, J., Pal, U.: DocSynth: a layout guided approach for controllable document image synthesis. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 555–568. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_36

    Chapter  Google Scholar 

  6. Cai, M., Zhang, H., Huang, H., Geng, Q., Li, Y., Huang, G.: Frequency domain image translation: more photo-realistic, better identity-preserving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13930–13940 (2021)

    Google Scholar 

  7. Capobianco, S., Marinai, S.: Docemul: a toolkit to generate structured historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1186–1191. IEEE (2017)

    Google Scholar 

  8. Dey, S., Jawanpuria, P.: Light-weight document image cleanup using perceptual loss. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 238–253. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_16

    Chapter  Google Scholar 

  9. Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Improving CNN-RNN hybrid networks for handwriting recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 80–85. IEEE (2018)

    Google Scholar 

  10. Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: Scrabblegan: semi-supervised varying length handwritten text generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4324–4333 (2020)

    Google Scholar 

  11. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)

    Google Scholar 

  12. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014)

    Google Scholar 

  13. Guan, M., Ding, H., Chen, K., Huo, Q.: Improving handwritten OCR with augmented text line images synthesized from online handwriting samples by style-conditioned GAN. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 151–156. IEEE (2020)

    Google Scholar 

  14. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134 (2017)

    Google Scholar 

  15. Jiang, L., Dai, B., Wu, W., Loy, C.C.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13919–13929 (2021)

    Google Scholar 

  16. Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., Billy, A.: Doccreator: a new software for creating synthetic ground-truthed document images. J. Imaging 3(4), 62 (2017)

    Article  Google Scholar 

  17. Kang, L., Riba, P., Rusinol, M., Fornes, A., Villegas, M.: Content and style aware generation of text-line images for handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 44(12), 8846–8860 (2021)

    Google Scholar 

  18. Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: GANwriting: content-conditioned generation of styled handwritten word images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 273–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_17

    Chapter  Google Scholar 

  19. Kieu, V., Visani, M., Journet, N., Domenger, J.P., Mullot, R.: A character degradation model for grayscale ancient document images. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 685–688. IEEE (2012)

    Google Scholar 

  20. Larson, S., Lim, G., Ai, Y., Kuang, D., Leach, K.: Evaluating out-of-distribution performance on document image classifiers. In: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS) (2022)

    Google Scholar 

  21. Lee, Y., Hong, T., Kim, S.: Data augmentations for document images. In: SDU@ AAAI (2021)

    Google Scholar 

  22. Lin, Y.H., Chen, W.C., Chuang, Y.Y.: Bedsr-net: a deep shadow removal network from a single document image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12905–12914 (2020)

    Google Scholar 

  23. Maini, S., Groleau, A., Chee, K.W., Larson, S., Boarman, J.: Augraphy: A data augmentation library for document images. arXiv preprint arXiv:2208.14558 (2022)

  24. Marti, U.V., Bunke, H.: The iam-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  25. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  26. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  27. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  28. Poddar, A., Chakraborty, A., Mukhopadhyay, J., Biswas, P.K.: Detection and localisation of struck-out-strokes in handwritten manuscripts. In: Barney Smith, E.H., Pal, U. (eds.) ICDAR 2021. LNCS, vol. 12917, pp. 98–112. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_7

    Chapter  Google Scholar 

  29. Poddar, A., Chakraborty, A., Mukhopadhyay, J., Biswas, P.K.: Texrgan: a deep adversarial framework for text restoration from deformed handwritten documents. In: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp. 1–9 (2021)

    Google Scholar 

  30. Pondenkandath, V., Alberti, M., Diatta, M., Ingold, R., Liwicki, M.: Historical document synthesis with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 146–151. IEEE (2019)

    Google Scholar 

  31. Rahaman, N., et al.: On the spectral bias of neural networks. In: International Conference on Machine Learning (ICML), pp. 5301–5310. PMLR (2019)

    Google Scholar 

  32. Seuret, M., Chen, K., Eichenbergery, N., Liwicki, M., Ingold, R.: Gradient-domain degradations for improving historical documents images layout analysis. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1006–1010. IEEE (2015)

    Google Scholar 

  33. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 39(11), 2298–2304 (2016)

    Google Scholar 

  34. Souibgui, M.A., Kessentini, Y.: De-GAN: a conditional generative adversarial network for document enhancement. IEEE Trans. Patteren Anal. Mach. Intell. (T-PAMI) 44(3), 1180–1191 (2020)

    Google Scholar 

  35. Strauß, T., Leifert, G., Labahn, R., Hodel, T., Mühlberger, G.: Icfhr 2018 competition on automated text recognition on a read dataset. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 477–482. IEEE (2018)

    Google Scholar 

  36. Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inf. Process. Syst. (NeurIPS) 33, 7537–7547 (2020)

    Google Scholar 

  37. Tensmeyer, C., Brodie, M., Saunders, D., Martinez, T.: Generating realistic binarization data with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 172–177. IEEE (2019)

    Google Scholar 

  38. Tensmeyer, C., Martinez, T.: Historical document image binarization: a review. SN Comput. Sci. 1(3), 1–26 (2020)

    Article  Google Scholar 

  39. Toshevska, M., Gievska, S.: A review of text style transfer using deep learning. IEEE Trans. Artif. Intell. (T-AI) 3, 669–684 (2021)

    Google Scholar 

  40. Vögtlin, L., Drazyk, M., Pondenkandath, V., Alberti, M., Ingold, R.: Generating synthetic handwritten historical documents with OCR constrained GANs. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 610–625. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_40

    Chapter  Google Scholar 

  41. Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 639–645. IEEE (2017)

    Google Scholar 

  42. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 34, pp. 13001–13008 (2020)

    Google Scholar 

  43. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by Microsoft Academic Partnership Grant (MAPG) 2022-2023 with grant number IIT/SRIC/CS/ADD/2022-2023/065.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Poddar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Poddar, A., Dey, S., Jawanpuria, P., Mukhopadhyay, J., Kumar Biswas, P. (2023). TBM-GAN: Synthetic Document Generation with Degraded Background. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41679-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41678-1

  • Online ISBN: 978-3-031-41679-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics