Abstract
Semantic segmentation is a crucial issue in the field of computer vision, and it aims to assign each pixel in an image to a semantic object category. Modern cognitive research has presented that the biological system contains hidden features and explicit features, although they both contain useful information, the hidden features need further processing to make them explicit or clear. Inspired by this theory, a semantic segmentation framework named hierarchical attention network assembling is proposed. Multiple auxilary information of different levels corresponding to the two kinds of features of the cognitive system are exploited. Then we further process the hidden information to make them explicit for the semantic segmentation. While in the traditional methods, limited assistance of the auxiliary tasks with only hidden information is provided. In this study, the attention mechanism is utilized and two auxiliary tasks are introduced as attention modules to give explicit guidance to the semantic segmentation task. Two hierarchical sub-networks—an object-level bounding box attention network and an edge-level boundary attention network together serve as explicit auxiliary tasks, of which the first network driven by the object detection aims to aggrandize the consistency constraint of pixels belonging to the same object, and the second one driven by the boundary detection aims to improve the segmentation accuracy within the boundary regions. With the proposed method, the performance achieves 78.3% mean IOU on PASCAL VOC 2012. The explicit guidance of the two auxiliary tasks can well assist the semantic segmentation task.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation, In CVPR, 2015; pp. 3431–3440.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation, In MICCAI, 2015; pp. 34–241.
Badrinarayanan V, Kendall A, Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2017;39(12):2481–95.
Zhang H, Dana K, Shi JP, Zhang ZY, Wang XG, Tyagi A, Agrawal A. Context encoding for semantic segmentation, In CVPR, 2018.
Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network, In CVPR, 2016; pp. 6230–6239.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Semantic image segmentation with deep convolutional nets and fully connected CRFs, in ICLR, 2015.
Xie J, Yu L, Zhu L, Chen X. Semantic image segmentation method with multiple adjacency trees and multiscale features. Cognitive Computation. 2017;9:168–79.
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N. Learning a discriminative feature network for semantic segmentation, In CVPR, 2018.
Dai J, He K, Sun J. Instance-aware semantic segmentation via multi-task network cascades, In CVPR, 2016; pp. 3150–3158.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected. Crfs, IEEE Transactions on Pattern Analysis & Machine Intelligence. 2017;40(4):834–48.
Naveen C, Himadri V, Jayanta KG. Human cognition based framework for detecting roads from remote sensing images. Geocarto International, 2020; pp. 1–20.
Naveen C, Jayanta KG. A cognitive framework for road detection from high resolution satellite images. Geocarto International. 2018;34:909–24.
Naveen C, Jayanta KG. A cognitive viewpoint on building detection from remotely sensed multispectral images. IETE-Journal of Research. 2017;64:165–75.
Naveen C, Jayanta KG. A cognitive method for building detection from high resolution satellite images. Current Science. 2017;112(5):1038–44.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation, In CVPR, 2014; pp. 580–587.
Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks, In NIPS, 2015; pp. 91–99.
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection, In CVPR, 2016; pp. 779–788.
Redmon J, Farhadi A. Yolo9000: better, faster, stronger, In CVPR, 2017; pp. 6517–6525.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed DE, Fu C, Berg SC. SSD: single shot multibox detector, In ECCV, 2016; pp.21–37.
Lin T, Goyal P, Girshick RB, He K, Dollar P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, 2017; 2999–3007.
He K, Gkioxari G, Dollar P, Girshick R. Mask r-cnn, In CVPR, 2017; pp. 2980–2988.
Dai J, He K, Li Y, Ren S, Sun J. Instance-sensitive fully convolutional networks, In ECCV, 2016; pp. 534–549.
Liu Y, Cheng MM, Hu X, Wang K, Bai X. Richer convolutional features for edge detection, In CVPR, 2017; pp. 5872–5881.
Yu Z, Feng C, Liu MY, Ramalingam S. Casenet: deep category-aware semantic boundary detection, In CVPR, 2017; pp. 1761–1770.
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. International Journal of Computer Vision. 2010;88(2):303–38.
Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J. Semantic contours from inverse detectors, In ICCV, 2011; pp. 991–998.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Li FF. ImageNet large scale visual recognition challenge. Int J Comput Vis 2015;115(3):211–52.
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation, in ICCV, 2015; pp. 1520–1528.
Lin GS, Shen CH, van den Hengel A, Reid I. Exploring context with deep structured models for semantic segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2018;40(6):1352–66.
Wu H, Zhang J, Huang K, et al. FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv: 1903.11816, 2019.
Kirillov A, Girshick R, He K, et al. Panoptic Feature Pyramid Networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, W., Li, D. & Su, H. HANA: Hierarchical Attention Network Assembling for Semantic Segmentation. Cogn Comput 13, 1128–1135 (2021). https://doi.org/10.1007/s12559-021-09911-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09911-z