ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks

Kang, Seokjun; Iwana, Brian Kenji; Uchida, Seiichi

doi:10.1007/978-3-030-57058-3_9

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

International Workshop on Document Analysis Systems

1277 Accesses

Abstract

In document analysis research, image-to-image conversion models such as a U-Net have been shown significant performance. Recently, cascaded U-Nets research is suggested for solving complex document analysis studies. However, improving performance by adding U-Net modules requires using too many parameters in cascaded U-Nets. Therefore, in this paper, we propose a method for enhancing the performance of cascaded U-Nets. We suggest a novel document image binarization method by utilizing Cascading Modular U-Nets (CMU-Nets) and Squeeze and Excitation blocks (SE-blocks). Through verification experiments, we point out the problems caused by the use of SE-blocks in existing CMU-Nets and suggest how to use SE-blocks in CMU-Nets. We use the Document Image Binarization (DIBCO) 2017 dataset to evaluate the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37–42, April 2018. https://doi.org/10.1109/DAS.2018.34
Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37–42, April 2018
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recognit. 91, 379–390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Huang, X., Li, L., Liu, R., Xu, C., Ye, M.: Binarization of degraded document images with global-local u-nets. Optik 203, 164025 (2020). https://doi.org/10.1016/j.ijleo.2019.164025
Article Google Scholar
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 2017–2025 (2015)
Google Scholar
Kang, S., Uchida, S., Iwana, B.K.: Cascading modular u-nets for document image binarization. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 675–680 (2019)
Google Scholar
Lee, J., Hayashi, H., Ohyama, W., Uchida, S.: Page segmentation using a convolutional neural network with trainable co-occurrence features. In: International Conference on Document Analysis and Recognition (ICDAR) (2019)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE, June 2015. https://doi.org/10.1109/cvpr.2015.7298965
Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004). https://doi.org/10.1109/lsp.2003.821748
Article Google Scholar
Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 414–419, November 2017. https://doi.org/10.1109/ICDAR.2017.75
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013). https://doi.org/10.1109/tip.2012.2219550
Article MathSciNet MATH Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/tsmc.1979.4310076
Article Google Scholar
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR2017 competition on document image binarization (DIBCO 2017). In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 1395–1403. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.228
Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656–6664 (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Rundo, L., et al.: USE-Net: incorporating squeeze-and-excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets. arXiv preprint arXiv:1904.08254 (2019)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000). https://doi.org/10.1016/s0031-3203(99)00055-2
Article Google Scholar
Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 99–104. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.25
Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Google Scholar
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Xu, Y., He, W., Yin, F., Liu, C.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 541–546, November 2017. https://doi.org/10.1109/ICDAR.2017.94
Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector. In: Proceedings of the 2016 International Conference on Pattern Recognition, pp. 633–638. IEEE, December 2016. https://doi.org/10.1109/icpr.2016.7899705
Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognit. 96, 106968 (2019). https://doi.org/10.1016/j.patcog.2019.106968
Article Google Scholar
Zhu, W., et al.: AnatomyNet: deep 3D squeeze-and-excitation U-nets for fast and fully automated whole-volume anatomical segmentation. bioRxiv, p. 392969 (2018)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP17K19402 and JP17H06100.

Author information

Authors and Affiliations

Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
Seokjun Kang, Brian Kenji Iwana & Seiichi Uchida

Authors

Seokjun Kang
View author publications
You can also search for this author in PubMed Google Scholar
Brian Kenji Iwana
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Uchida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seokjun Kang .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Autonomous University of Barcelona, Barcelona, Spain
Dimosthenis Karatzas
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, S., Iwana, B.K., Uchida, S. (2020). ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-57058-3_9
Published: 14 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)