Abstract
In document analysis research, image-to-image conversion models such as a U-Net have been shown significant performance. Recently, cascaded U-Nets research is suggested for solving complex document analysis studies. However, improving performance by adding U-Net modules requires using too many parameters in cascaded U-Nets. Therefore, in this paper, we propose a method for enhancing the performance of cascaded U-Nets. We suggest a novel document image binarization method by utilizing Cascading Modular U-NetsĀ (CMU-Nets) and Squeeze and Excitation blocksĀ (SE-blocks). Through verification experiments, we point out the problems caused by the use of SE-blocks in existing CMU-Nets and suggest how to use SE-blocks in CMU-Nets. We use the Document Image BinarizationĀ (DIBCO) 2017 dataset to evaluate the proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37ā42, April 2018. https://doi.org/10.1109/DAS.2018.34
Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37ā42, April 2018
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recognit. 91, 379ā390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132ā7141 (2018)
Huang, X., Li, L., Liu, R., Xu, C., Ye, M.: Binarization of degraded document images with global-local u-nets. Optik 203, 164025 (2020). https://doi.org/10.1016/j.ijleo.2019.164025
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603ā612 (2019)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 2017ā2025 (2015)
Kang, S., Uchida, S., Iwana, B.K.: Cascading modular u-nets for document image binarization. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 675ā680 (2019)
Lee, J., Hayashi, H., Ohyama, W., Uchida, S.: Page segmentation using a convolutional neural network with trainable co-occurrence features. In: International Conference on Document Analysis and Recognition (ICDAR) (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431ā3440. IEEE, June 2015. https://doi.org/10.1109/cvpr.2015.7298965
Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228ā231 (2004). https://doi.org/10.1109/lsp.2003.821748
Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 414ā419, November 2017. https://doi.org/10.1109/ICDAR.2017.75
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595ā609 (2013). https://doi.org/10.1109/tip.2012.2219550
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62ā66 (1979). https://doi.org/10.1109/tsmc.1979.4310076
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR2017 competition on document image binarization (DIBCO 2017). In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 1395ā1403. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.228
Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656ā6664 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234ā241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rundo, L., et al.: USE-Net: incorporating squeeze-and-excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets. arXiv preprint arXiv:1904.08254 (2019)
Sauvola, J., PietikƤinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225ā236 (2000). https://doi.org/10.1016/s0031-3203(99)00055-2
Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: Proceedings of the 2017 International Conference on Document Analysis and Recognition, pp. 99ā104. IEEE, November 2017. https://doi.org/10.1109/icdar.2017.25
Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156ā3164 (2017)
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3ā19 (2018)
Xu, Y., He, W., Yin, F., Liu, C.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 541ā546, November 2017. https://doi.org/10.1109/ICDAR.2017.94
Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector. In: Proceedings of the 2016 International Conference on Pattern Recognition, pp. 633ā638. IEEE, December 2016. https://doi.org/10.1109/icpr.2016.7899705
Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognit. 96, 106968 (2019). https://doi.org/10.1016/j.patcog.2019.106968
Zhu, W., et al.: AnatomyNet: deep 3D squeeze-and-excitation U-nets for fast and fully automated whole-volume anatomical segmentation. bioRxiv, p. 392969 (2018)
Acknowledgement
This work was supported by JSPS KAKENHI Grant Number JP17K19402 and JP17H06100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kang, S., Iwana, B.K., Uchida, S. (2020). ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-57058-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)