Skip to main content

ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2020)

Abstract

In document analysis research, image-to-image conversion models such as a U-Net have been shown significant performance. Recently, cascaded U-Nets research is suggested for solving complex document analysis studies. However, improving performance by adding U-Net modules requires using too many parameters in cascaded U-Nets. Therefore, in this paper, we propose a method for enhancing the performance of cascaded U-Nets. We suggest a novel document image binarization method by utilizing Cascading Modular U-NetsĀ (CMU-Nets) and Squeeze and Excitation blocksĀ (SE-blocks). Through verification experiments, we point out the problems caused by the use of SE-blocks in existing CMU-Nets and suggest how to use SE-blocks in CMU-Nets. We use the Document Image BinarizationĀ (DIBCO) 2017 dataset to evaluate the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37ā€“42, April 2018. https://doi.org/10.1109/DAS.2018.34

  3. Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37ā€“42, April 2018

    Google ScholarĀ 

  4. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)

  5. He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recognit. 91, 379ā€“390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025

    ArticleĀ  Google ScholarĀ 

  6. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132ā€“7141 (2018)

    Google ScholarĀ 

  7. Huang, X., Li, L., Liu, R., Xu, C., Ye, M.: Binarization of degraded document images with global-local u-nets. Optik 203, 164025 (2020). https://doi.org/10.1016/j.ijleo.2019.164025

    ArticleĀ  Google ScholarĀ 

  8. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603ā€“612 (2019)

    Google ScholarĀ 

  9. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 2017ā€“2025 (2015)

    Google ScholarĀ 

  10. Kang, S., Uchida, S., Iwana, B.K.: Cascading modular u-nets for document image binarization. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 675ā€“680 (2019)

    Google ScholarĀ 

  11. Lee, J., Hayashi, H., Ohyama, W., Uchida, S.: Page segmentation using a convolutional neural network with trainable co-occurrence features. In: International Conference on Document Analysis and Recognition (ICDAR) (2019)

    Google ScholarĀ 

  12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431ā€“3440. IEEE, June 2015. https://doi.org/10.1109/cvpr.2015.7298965

  13. Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228ā€“231 (2004). https://doi.org/10.1109/lsp.2003.821748

    ArticleĀ  Google ScholarĀ 

  14. Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 414ā€“419, November 2017. https://doi.org/10.1109/ICDAR.2017.75

  15. Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595ā€“609 (2013). https://doi.org/10.1109/tip.2012.2219550

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  16. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62ā€“66 (1979). https://doi.org/10.1109/tsmc.1979.4310076

    ArticleĀ  Google ScholarĀ 

  17. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR2017 competition on document image binarization (DIBCO 2017). In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 1395ā€“1403. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.228

  18. Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656ā€“6664 (2017)

    Google ScholarĀ 

  19. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234ā€“241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    ChapterĀ  Google ScholarĀ 

  20. Rundo, L., et al.: USE-Net: incorporating squeeze-and-excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets. arXiv preprint arXiv:1904.08254 (2019)

  21. Sauvola, J., PietikƤinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225ā€“236 (2000). https://doi.org/10.1016/s0031-3203(99)00055-2

    ArticleĀ  Google ScholarĀ 

  22. Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 99ā€“104. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.25

  23. Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156ā€“3164 (2017)

    Google ScholarĀ 

  24. Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3ā€“19 (2018)

    Google ScholarĀ 

  25. Xu, Y., He, W., Yin, F., Liu, C.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 541ā€“546, November 2017. https://doi.org/10.1109/ICDAR.2017.94

  26. Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector. In: Proceedings of the 2016 International Conference on Pattern Recognition, pp. 633ā€“638. IEEE, December 2016. https://doi.org/10.1109/icpr.2016.7899705

  27. Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognit. 96, 106968 (2019). https://doi.org/10.1016/j.patcog.2019.106968

    ArticleĀ  Google ScholarĀ 

  28. Zhu, W., et al.: AnatomyNet: deep 3D squeeze-and-excitation U-nets for fast and fully automated whole-volume anatomical segmentation. bioRxiv, p. 392969 (2018)

    Google ScholarĀ 

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP17K19402 and JP17H06100.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seokjun Kang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, S., Iwana, B.K., Uchida, S. (2020). ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57058-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57057-6

  • Online ISBN: 978-3-030-57058-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics