Abstract
Nowadays the visual saliency prediction has become a fundamental problem in 3D imaging area. In this paper, we proposed a saliency prediction model from the perspective of addressing three aspects of challenges. First, to adequately extract features of RGB and depth information, we designed an asymmetric encoder structure on the base of U-shape architecture. Second, to prevent the semantic information between salient objects and corresponding contexts from diluting in cross-modal distillation stream, we devised a global guidance module to capture high-level feature maps and deliver them into feature maps in shallower layers. Third, to locate and emphasize salient objects, we introduced a channel-wise attention model. Finally we built the refinement stream with integrated fusion strategy, gradually refining the saliency maps from coarse to fine-grained. Experiments on two widely-used datasets demonstrate the effectiveness of the proposed architecture, and the results show that our model outperforms six selective state-of-the-art models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Moon, J., Choe, S., Lee, S., Kwon, O.S.: Temporal dynamics of visual attention allocation. Sci. Rep. 9(1), 1–11 (2019)
Bosse, S., Maniry, D., Müller, K.R., Wiegand, T., Samek, W.: Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 27(1), 206–219 (2017)
Po, L.M., et al.: A novel patch variance biased convolutional neural network for no-reference image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 29(4), 1223–1229 (2019)
Liu, C., et al.: Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 82–92. IEEE (2019)
Lei, X., Ouyang, H.: Image segmentation algorithm based on improved fuzzy clustering. Cluster Comput. 22(6), 13911–13921 (2018). https://doi.org/10.1007/s10586-018-2128-9
Huang, H., Meng, F., Zhou, S., Jiang, F., Manogaran, G.: Brain image segmentation based on FCM clustering algorithm and rough set. IEEE Access 7, 12386–12396 (2019)
Chen, Z., Wei, X., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 5172–5181. IEEE (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV. IEEE (2016)
Noori, M., Mohammadi, S., Majelan, S.G., Bahri, A., Havaei, M.: DFNet: discriminative feature extraction and integration network for salient object detection. Eng. Appl. Artif. Intell. 89, 103419 (2020)
Yang, S., Lin, G., Jiang, Q., Lin, W.: A dilated inception network for visual saliency prediction. IEEE Trans. Multimed. 22, 2163–2176 (2019)
Cordel, M.O., Fan, S., Shen, Z., Kankanhalli, M.S.: Emotion-aware human attention prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 4021–4030. IEEE (2019)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: SAM: pushing the limits of saliency prediction models. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, pp. 1971–19712. IEEE (2018)
Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2017)
Liu, W., Sui, Y., Meng, L., Cheng, Z., Zhao, S.: Multiscope contextual information for saliency prediction. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, pp. 495–499, IEEE (2019)
Liu, N., Han, J., Liu, T., Li, X.: Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans. Neural Networks Learn. Systems 29(2), 392–404 (2016)
Tao, X., Xu, C., Gong, Y., Wang, J.: A deep CNN with focused attention objective for integrated object recognition and localization. In: Chen, E., Gong, Y., Tie, Y. (eds.) PCM 2016. LNCS, vol. 9917, pp. 43–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48896-7_5
Yang, Y., Li, B., Li, P., Liu, Q.: A two-stage clustering based 3D visual saliency model for dynamic scenarios. IEEE Trans. Multimedia 21(4), 809–820 (2019)
Nguyen, A., Kim, J., Oh, H., Kim, H., Lin, W., Lee, S.: Deep visual saliency on stereoscopic images. IEEE Trans. Image Process. 28(4), 1939–1953 (2019)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp. 7253–7262. IEEE (2019)
Sun, Z., Wang, X., Zhang, Q., Jiang, J.: Real-time video saliency prediction via 3D residual convolutional neural network. IEEE Access 7, 147743–147754 (2019)
Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01821-9
Li, G., Gan, Y., Wu, H., Xiao, N., Lin, L.: Cross-modal attentional context learning for RGB-D object detection. IEEE Trans. Image Process. 28(4), 1591–1601 (2018)
Jiang, M.X., Deng, C., Shan, J.S., Wang, Y.Y., Jia, Y.J., Sun, X.: Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking. Inf. Fusion 50, 1–8 (2019)
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Ma, C.Y., Hang, H.M.: Learning-based saliency model with depth information. J. Vis. 15(6), 19 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770–778. IEEE (2016)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 2012 Advances in Neural Information Processing Systems, NIPS, Lake Tahoe, Nevada, US, pp. 1097–1105 (2012)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.-M.: Pyramid dilated deeper ConvLSTM for video salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 744–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_44
Huang, M., Liu, Z., Ye, L., Zhou, X., Wang, Y.: Saliency detection via multi-level integration and multi-scale fusion neural networks. Neurocomputing 364, 310–321 (2019)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Workshop Autodiff Decision Program Chairs. NIPS, Long Beach, US (2017)
Fang, Y., Wang, J., Narwaria, M., Le Callet, P., Lin, W.: Saliency detection for stereoscopic images. IEEE Trans. Image Process. 23(6), 2625–2636 (2014)
Kruthiventi, S.S., Ayush, K., Babu, R.V.: DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 23rd International Conference on Pattern Recognition, Cancun, Mexico, pp. 3488–3493. IEEE (2017)
Tavakoli, H.R., Borji, A., Laaksonen, J., Rahtu, E.: Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 244, 10–18 (2017)
Qi, F., Zhao, D., Liu, S., Fan, X.: 3D visual saliency detection model with generated disparity map. Multimedia Tools Appl. 76(2), 3087–3103 (2016). https://doi.org/10.1007/s11042-015-3229-6
Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.: Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 815–828 (2017)
Wang, T., Borji, A., Zhang, L., Zhang, P., Lu, H.: A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE International Conference on Computer Vision, Venice, pp. 4039–4048. IEEE (2017)
Liang, Y., et al.: TFPN: twin feature pyramid networks for object detection. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, Portland, OR, USA, pp. 1702–1707. IEEE (2019)
Zhao, B., Zhao, B., Tang, L., Wang, W., Chen, W.: Multi-scale object detection by top-down and bottom-up feature pyramid network. J. Syst. Eng. Electron. 30(1), 1–12 (2019)
Acknowledgement
This paper is supported by Hainan Provincial Natural Science Foundation of China (618QN217) and National Nature Science Foundation of China (61862021).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Jin, T. (2020). Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images. In: Xu, R., De, W., Zhong, W., Tian, L., Bai, Y., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2020. AIMS 2020. Lecture Notes in Computer Science(), vol 12401. Springer, Cham. https://doi.org/10.1007/978-3-030-59605-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-59605-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59604-0
Online ISBN: 978-3-030-59605-7
eBook Packages: Computer ScienceComputer Science (R0)