Skip to main content
Log in

IoT-based 3D convolution for video salient object detection

  • Intelligent Biomedical Data Analysis and Processing
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The video salient object detection (SOD) is the first step for the devices in the Internet of Things (IoT) to understand the environment around them. The video SOD needs the objects’ motion information in contiguous video frames as well as spatial contrast information from a single video frame. A large number of IoT devices’ computing power is not sufficient to support the existing SOD methods’ expensive computational complexity in emotion estimation, because they might have low hardware configurations (e.g., surveillance camera, and smartphone). In order to model the objects’ motion information efficiently for SOD, we propose an end-to-end video SOD algorithm with an efficient representation of the objects’ motion information. This algorithm contains two major parts: a 3D convolution-based X-shape structure that directly represents the motion information in successive video frames efficiently, and 2D densely connected convolutional neural networks (DenseNet) with pyramid structure to extract the rich spatial contrast information in a single video frame. Our method not only can maintain a small number of parameters as the 2D convolutional neural network but also represents spatiotemporal information uniformly that enables it can be trained end-to-end. We evaluate our proposed method on four benchmark datasets. The results show that our method achieves state-of-the-art performance compared with the other five methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Borji A (2012) Boosting bottom-up and top-down visual features for saliency estimation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 438–445

  2. Chen S, Xu H, Liu D, Hu B, Wang H (2014) A vision of IoT: applications, challenges, and opportunities with China perspective. IEEE Internet Things J 1(4):349–359. https://doi.org/10.1109/JIOT.2014.2337336

    Article  Google Scholar 

  3. Cheng MM, Mitra NJ, Huang XL, Torr PHS, Hu SM (2015) Global contrast based salient region detection. IEEE TPAMI 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401

    Article  Google Scholar 

  4. Fukuchi K, Miyazato K, Kimura A, Takagi S, Yamato J (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: 2009 IEEE international conference on multimedia and expo (ICME), pp 638–641

  5. Gao H, Zhuang L, Laurens M, Kilian W (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  6. Guo C, Ma Q, Zhang L (2008) Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

  7. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PHS (2018) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 1–1

  8. Hsu KJ, Lin YY, Chuang YY (2017) Weakly supervised saliency detection with a category-driven map generator. In: British machine vision conference (BMVC)

  9. Hu P, Shuai B, Liu J, Wang G (2017) Deep level sets for salient object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)

  10. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  11. Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing Markov chain. In: 2013 IEEE international conference on computer vision (ICCV), pp 1665–1672

  12. Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision, pp 2106–2113

  13. Kazuma A, Fukuchi K, Kimura A, Takagi S (2010) Fully automatic extraction of salient objects from videos in near real-time. CoRR 1–25

  14. Le TN, Sugimoto A (2017) Spatiotemporal utilization of deep features for video saliency detection. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 465–470

  15. Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: 2011 International conference on computer vision (ICCV), pp 1995–2002

  16. Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5455–5463

  17. Li G, Xie Y, Wei T, Wang K, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252

  18. Li GB, Xie Y, Lin L, Yu YZ (2017) Instance-level salient object segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 247–256

  19. Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of the British machine vision conference (BMVC). BMVA Press, pp 86.1–86.11

  20. Li J, Xia C, Chen X (2018) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349–364

    Article  MathSciNet  Google Scholar 

  21. Li X, Zhao LM, Wei L, Yang MH, Wu F, Zhuang YT, Ling HB, Wang JD (2016) Deepsaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25(8):3919–3930

    Article  MathSciNet  Google Scholar 

  22. Liu N, Han J (2016) Dhsnet: deep hierarchical saliency network for salient object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 678–686

  23. Liu T, Zheng N, Wei, Yuan Z (2008) Video attention: learning to detect a salient object sequence. In: 2008 19th International conference on pattern recognition (ICPR), pp 1–4

  24. Luo ZM, Mishra A, Achkar A, Eichel J, Li SZ, Jodoin PM (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)

  25. Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 670–677

  26. Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct? In: 2013 IEEE conference on computer vision and pattern recognition, pp 1139–1146

  27. Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS (2018) Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet Things J 5(2):624–635. https://doi.org/10.1109/JIOT.2017.2712560

    Article  Google Scholar 

  28. Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187–1200

    Article  Google Scholar 

  29. Perazzi F, Pont-Tuset J, McWilliams B, Gool LV, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 724–732

  30. Rahtu E, Kannala J, Salo M, Heikkil J (2010) Segmenting salient objects from images and videos. In: Proceedings of the 11th European conference on computer vision: part V (ECCV), ECCV’10. Springer, Berlin, pp 366–379

    Chapter  Google Scholar 

  31. Seo PM (2009) Static and space–time visual saliency detection by self-resemblance. J Vis 9(12):15

    Article  Google Scholar 

  32. Sezer OB, Dogdu E, Ozbayoglu AM (2018) Context-aware computing, learning, and big data in internet of things: a survey. IEEE Internet Things J 5(1):1–27. https://doi.org/10.1109/JIOT.2017.2773600

    Article  Google Scholar 

  33. Stankovic JA (2014) Research directions for the internet of things. IEEE Internet Things J 1(1):3–9. https://doi.org/10.1109/JIOT.2014.2312291

    Article  MathSciNet  Google Scholar 

  34. Sudre CH, Li WQ, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, pp 240–248

  35. Sumati M, Shanu S (2016) Analysis of computer vision based techniques for motion detection. In: Cloud system and big data engineering. IEEE, pp 445–450

  36. Lijun W, Huchuan L, Xiang R, Ming-Hsuan Y (2015) Deep networks for saliency detection via local estimation and global search. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3183–3192

  37. Wang LJ, Lu HH, Wang YF, Feng MY, Wang D, Yin BC, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)

  38. Wang T, Borji A, Zhang LH, Zhang PP, Lu HC (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4019–4028

  39. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3395–3402

  40. Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  41. Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49

    Article  MathSciNet  Google Scholar 

  42. Xiao X, Xu C, Rui Y (2010) Video based 3D reconstruction using spatio-temporal attention analysis. In: 2010 IEEE international conference on multimedia and expo (ICME), pp 1091–1096

  43. Yang C, Zhang LH, Lu HH, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3166–3173

  44. Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32

    Article  Google Scholar 

  45. Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1265–1274

Download references

Funding

This study was funded by Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 218165), Shenzhen Key Laboratory of Neuropsychiatric Modulation (CN) (Grant No. JCYJ20170307165309009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gui-Bin Bian.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, S., Gao, Z., Pirbhulal, S. et al. IoT-based 3D convolution for video salient object detection. Neural Comput & Applic 32, 735–746 (2020). https://doi.org/10.1007/s00521-018-03971-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-03971-3

Keywords

Mathematics Subject Classification

Navigation