Skip to main content

Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images

  • Conference paper
  • First Online:
Artificial Intelligence and Mobile Services – AIMS 2020 (AIMS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12401))

Included in the following conference series:

  • 435 Accesses

Abstract

Nowadays the visual saliency prediction has become a fundamental problem in 3D imaging area. In this paper, we proposed a saliency prediction model from the perspective of addressing three aspects of challenges. First, to adequately extract features of RGB and depth information, we designed an asymmetric encoder structure on the base of U-shape architecture. Second, to prevent the semantic information between salient objects and corresponding contexts from diluting in cross-modal distillation stream, we devised a global guidance module to capture high-level feature maps and deliver them into feature maps in shallower layers. Third, to locate and emphasize salient objects, we introduced a channel-wise attention model. Finally we built the refinement stream with integrated fusion strategy, gradually refining the saliency maps from coarse to fine-grained. Experiments on two widely-used datasets demonstrate the effectiveness of the proposed architecture, and the results show that our model outperforms six selective state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)

    Article  Google Scholar 

  2. Moon, J., Choe, S., Lee, S., Kwon, O.S.: Temporal dynamics of visual attention allocation. Sci. Rep. 9(1), 1–11 (2019)

    Article  Google Scholar 

  3. Bosse, S., Maniry, D., Müller, K.R., Wiegand, T., Samek, W.: Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 27(1), 206–219 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  4. Po, L.M., et al.: A novel patch variance biased convolutional neural network for no-reference image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 29(4), 1223–1229 (2019)

    Article  Google Scholar 

  5. Liu, C., et al.: Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 82–92. IEEE (2019)

    Google Scholar 

  6. Lei, X., Ouyang, H.: Image segmentation algorithm based on improved fuzzy clustering. Cluster Comput. 22(6), 13911–13921 (2018). https://doi.org/10.1007/s10586-018-2128-9

    Article  Google Scholar 

  7. Huang, H., Meng, F., Zhou, S., Jiang, F., Manogaran, G.: Brain image segmentation based on FCM clustering algorithm and rough set. IEEE Access 7, 12386–12396 (2019)

    Article  Google Scholar 

  8. Chen, Z., Wei, X., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 5172–5181. IEEE (2019)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV. IEEE (2016)

    Google Scholar 

  10. Noori, M., Mohammadi, S., Majelan, S.G., Bahri, A., Havaei, M.: DFNet: discriminative feature extraction and integration network for salient object detection. Eng. Appl. Artif. Intell. 89, 103419 (2020)

    Article  Google Scholar 

  11. Yang, S., Lin, G., Jiang, Q., Lin, W.: A dilated inception network for visual saliency prediction. IEEE Trans. Multimed. 22, 2163–2176 (2019)

    Article  Google Scholar 

  12. Cordel, M.O., Fan, S., Shen, Z., Kankanhalli, M.S.: Emotion-aware human attention prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 4021–4030. IEEE (2019)

    Google Scholar 

  13. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: SAM: pushing the limits of saliency prediction models. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, pp. 1971–19712. IEEE (2018)

    Google Scholar 

  14. Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2017)

    Article  MathSciNet  Google Scholar 

  15. Liu, W., Sui, Y., Meng, L., Cheng, Z., Zhao, S.: Multiscope contextual information for saliency prediction. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, pp. 495–499, IEEE (2019)

    Google Scholar 

  16. Liu, N., Han, J., Liu, T., Li, X.: Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans. Neural Networks Learn. Systems 29(2), 392–404 (2016)

    Article  MathSciNet  Google Scholar 

  17. Tao, X., Xu, C., Gong, Y., Wang, J.: A deep CNN with focused attention objective for integrated object recognition and localization. In: Chen, E., Gong, Y., Tie, Y. (eds.) PCM 2016. LNCS, vol. 9917, pp. 43–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48896-7_5

    Chapter  Google Scholar 

  18. Yang, Y., Li, B., Li, P., Liu, Q.: A two-stage clustering based 3D visual saliency model for dynamic scenarios. IEEE Trans. Multimedia 21(4), 809–820 (2019)

    Article  Google Scholar 

  19. Nguyen, A., Kim, J., Oh, H., Kim, H., Lin, W., Lee, S.: Deep visual saliency on stereoscopic images. IEEE Trans. Image Process. 28(4), 1939–1953 (2019)

    Article  MathSciNet  Google Scholar 

  20. Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), pp. 7253–7262. IEEE (2019)

    Google Scholar 

  21. Sun, Z., Wang, X., Zhang, Q., Jiang, J.: Real-time video saliency prediction via 3D residual convolutional neural network. IEEE Access 7, 147743–147754 (2019)

    Article  Google Scholar 

  22. Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01821-9

    Article  Google Scholar 

  23. Li, G., Gan, Y., Wu, H., Xiao, N., Lin, L.: Cross-modal attentional context learning for RGB-D object detection. IEEE Trans. Image Process. 28(4), 1591–1601 (2018)

    Article  MathSciNet  Google Scholar 

  24. Jiang, M.X., Deng, C., Shan, J.S., Wang, Y.Y., Jia, Y.J., Sun, X.: Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking. Inf. Fusion 50, 1–8 (2019)

    Article  Google Scholar 

  25. Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8

    Chapter  Google Scholar 

  26. Ma, C.Y., Hang, H.M.: Learning-based saliency model with depth information. J. Vis. 15(6), 19 (2015)

    Article  Google Scholar 

  27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)

  28. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770–778. IEEE (2016)

    Google Scholar 

  29. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)

  30. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 2012 Advances in Neural Information Processing Systems, NIPS, Lake Tahoe, Nevada, US, pp. 1097–1105 (2012)

    Google Scholar 

  31. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23

    Chapter  Google Scholar 

  32. Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.-M.: Pyramid dilated deeper ConvLSTM for video salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 744–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_44

    Chapter  Google Scholar 

  33. Huang, M., Liu, Z., Ye, L., Zhou, X., Wang, Y.: Saliency detection via multi-level integration and multi-scale fusion neural networks. Neurocomputing 364, 310–321 (2019)

    Article  Google Scholar 

  34. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Workshop Autodiff Decision Program Chairs. NIPS, Long Beach, US (2017)

    Google Scholar 

  35. Fang, Y., Wang, J., Narwaria, M., Le Callet, P., Lin, W.: Saliency detection for stereoscopic images. IEEE Trans. Image Process. 23(6), 2625–2636 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  36. Kruthiventi, S.S., Ayush, K., Babu, R.V.: DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  37. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 23rd International Conference on Pattern Recognition, Cancun, Mexico, pp. 3488–3493. IEEE (2017)

    Google Scholar 

  38. Tavakoli, H.R., Borji, A., Laaksonen, J., Rahtu, E.: Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 244, 10–18 (2017)

    Article  Google Scholar 

  39. Qi, F., Zhao, D., Liu, S., Fan, X.: 3D visual saliency detection model with generated disparity map. Multimedia Tools Appl. 76(2), 3087–3103 (2016). https://doi.org/10.1007/s11042-015-3229-6

    Article  Google Scholar 

  40. Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.: Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 815–828 (2017)

    Article  Google Scholar 

  41. Wang, T., Borji, A., Zhang, L., Zhang, P., Lu, H.: A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE International Conference on Computer Vision, Venice, pp. 4039–4048. IEEE (2017)

    Google Scholar 

  42. Liang, Y., et al.: TFPN: twin feature pyramid networks for object detection. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence, Portland, OR, USA, pp. 1702–1707. IEEE (2019)

    Google Scholar 

  43. Zhao, B., Zhao, B., Tang, L., Wang, W., Chen, W.: Multi-scale object detection by top-down and bottom-up feature pyramid network. J. Syst. Eng. Electron. 30(1), 1–12 (2019)

    Article  Google Scholar 

Download references

Acknowledgement

This paper is supported by Hainan Provincial Natural Science Foundation of China (618QN217) and National Nature Science Foundation of China (61862021).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Jin, T. (2020). Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images. In: Xu, R., De, W., Zhong, W., Tian, L., Bai, Y., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2020. AIMS 2020. Lecture Notes in Computer Science(), vol 12401. Springer, Cham. https://doi.org/10.1007/978-3-030-59605-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59605-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59604-0

  • Online ISBN: 978-3-030-59605-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics