Skip to main content
Log in

Adaptive video object proposals by a context-aware model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most previous works focus on image object proposals while few on video object proposals. Besides, the existing explorations about video object proposals mainly concentrate on localizing the dominant object. In this paper, we aim at exploring a uniform framework for proposing multi-objects in videos no matter they are in the foreground or background. The method is derived from image object proposals, and makes best use of video characteristics. To achieve this task, we propose an adaptive context-aware model for video object proposals. First, spatial candidate windows are generated by the image method for acquiring the adequate bounding box samples. Temporal boxes are calculated by the motion based mapping. Considering the mapping loss, we define a box confidence coefficient contributing to keeping the proposal consistency and restraining the motion blur. The output proposal bounding boxes are ranked based on the scores calculated by the weighted scoring system. The proposed method is separately evaluated on the proposed multi-object dataset and the public dataset. The results compared with several state-of-the-arts show that our method has the most satisfactory overall performance for multi-object proposals in videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Alexe B, Deselaers T, Ferrari V (2010) What is an object? Proceedings of the IEEE conference on computer vision and pattern recognition, pp 73–80. IEEE

    Google Scholar 

  2. Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11):2189–2202

    Article  Google Scholar 

  3. Arbeláez P., Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping Proceedings of the IEEE conference on computer vision and pattern recognition, pp 328–335. IEEE

    Google Scholar 

  4. Bai T, Li YF, Zhou X (2015) Learning local appearances with sparse representation for robust and fast visual tracking. IEEE Transactions on Cybernetics 45(4):663–675

    Article  Google Scholar 

  5. Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories Proceedings of the european conference on computer vision, pp 282–295. Springer

    Google Scholar 

  6. Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1914–1923. IEEE

    Google Scholar 

  7. Chavali N, Agrawal H, Mahendru A, Batra D (2016) Object-proposal evaluation protocol is’ gameable’ Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE

    Google Scholar 

  8. Chen X, Ma H, Wang X, Zhao Z (2015) Improving object proposals with multi-thresholding straddling expansion Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2587–2595. IEEE

    Google Scholar 

  9. Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: Binarized normed gradients for objectness estimation at 300fps Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3286–3293. IEEE

    Google Scholar 

  10. Cheng Z, Li X, Shen J, Hauptmann AG (2016) Which information sources are more effective and reliable in video search Proceedings of the international conference on research on development in information retrieval, pp 1069–1072. ACM

    Google Scholar 

  11. Cheng Z, Shen J (2016) On very large scale test collection for landmark image search benchmarking. Signal Process 124:13–26

    Article  Google Scholar 

  12. Cheng Z, Shen J, Miao H (2016) The effects of multiple query evidences on social image retrieval. Multimedia Systems 22(4):509–523

    Article  Google Scholar 

  13. Choi MK, Wang Z, Lee HG, Lee SC (2016) A bag-of-regions representation for video classification. Multimedia Tools and Applications 75(5):2453–2472

    Article  Google Scholar 

  14. Chu WT, Yu CH, Wang HH (2015) Optimized comics-based storytelling for temporal image sequences. IEEE Transactions on Multimedia 17(2):201–215

    Article  Google Scholar 

  15. Endres I, Hoiem D (2010) Category independent object proposals Proceedings of the european conference on computer vision, pp 575–588. Springer

    Google Scholar 

  16. Endres I, Hoiem D (2014) Category-independent object proposals with diverse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2):222–234

    Article  Google Scholar 

  17. Geng W, Li S, Ren T, Wu G (2016) Object proposals using svm-based integrated model Proceedings of the international joint conference on neural networks, pp 4154–4161. IEEE

    Google Scholar 

  18. Geng W, Wu G (2016) Context-aware video object proposals Proceedings of the IEEE conference on parallel and distributed systems, pp 1203–1206. IEEE

    Google Scholar 

  19. Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) Deepproposal: Hunting objects by cascading deep convolutional layers Proceedings of the IEEE international conference on computer vision, pp 2578–2586. IEEE

  20. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587. IEEE

    Google Scholar 

  21. Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(1):142–158

    Article  Google Scholar 

  22. Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3090–3098. IEEE

  23. Hayder Z, He X, Salzmann M (2016) Learning to co-generate object proposals with a deep structured network Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE

    Google Scholar 

  24. Hosang J, Benenson R, Dollár P., Schiele B (2016) What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence 38(4):814–830

    Article  Google Scholar 

  25. Hu JF, Zheng WS, Ma L, Wang G, Lai J (2016) Real-time rgb-d activity prediction by soft regression Proceedings of the european conference on computer vision, pp 280–296. Springer

    Google Scholar 

  26. Hua Y, Alahari K, Schmid C (2015) Online object tracking with proposal selection Proceedings of the IEEE international conference on computer vision, pp 3092–3100. IEEE

    Google Scholar 

  27. Jain M, Van Gemert J, Jégou H, Bouthemy P, Snoek CG (2014) Action localization with tubelets from motion Proceedings of the IEEE conference on computer vision and pattern recognition, pp 740–747. IEEE

  28. Jang WD, Lee C, Kim CS (2016) Primary object segmentation in videos via alternate convex optimization of foreground and background distributions Proceedings of the IEEE conference on computer vision and pattern recognition, pp 696–704. IEEE

    Google Scholar 

  29. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE

    Google Scholar 

  30. Kuo W, Hariharan B, Malik J (2015) Deepbox: Learning objectness with convolutional networks Proceedings of the IEEE international conference on computer vision, pp 2479–2487. IEEE

    Google Scholar 

  31. Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electr Eng 54:68–77

    Article  Google Scholar 

  32. Liu J, Ren T, Bao BK, Bei J (2016) Depth-aware layered edge for object proposal Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE

    Google Scholar 

  33. Liu J, Ren T, Bei J (2016) Elastic edge boxes for object proposal on rgb-d images Proceedings of the international conference on multimedia modeling, pp 199–211. Springer

    Google Scholar 

  34. Liu J, Ren T, Wang Y, Zhong SH, Bei J, Chen S (2016) Object proposal on rgb-d images via elastic edge boxes. Neurocomputing 236:134–146

  35. Liu Y, Mei T, Chen CW (2016) Automatic suggestion of presentation image for storytelling Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE

    Google Scholar 

  36. Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32(1):1–19

    Article  Google Scholar 

  37. Manen S, Guillaumin M, Van Gool L (2013) Prime object proposals with randomized prim’s algorithm Proceedings of the IEEE international conference on computer vision, pp 2536–2543. IEEE

  38. Meng J, Wang H, Yuan J, Tan YP (2016) From keyframes to key objects: video summarization by representative object proposal selection Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1039–1048. IEEE

    Google Scholar 

  39. Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6):1187–1200

    Article  Google Scholar 

  40. Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals Proceedings of the european conference on computer vision, pp 737–752. Springer

    Google Scholar 

  41. Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video Proceedings of the IEEE international conference on computer vision, pp 1777–1784. IEEE

    Google Scholar 

  42. Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2015) Fully connected object proposals for video segmentation Proceedings of the IEEE international conference on computer vision, pp 3227–3234. IEEE

    Google Scholar 

  43. Pont-Tuset J, Marques F (2016) Supervised evaluation of image segmentation and object proposal techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(7):1465–1478

    Article  Google Scholar 

  44. Pont-Tuset J, Van Gool L (2015) Boosting object proposals: From pascal to coco Proceedings of the IEEE international conference on computer vision, pp 1546–1554. IEEE

  45. Rahtu E, Kannala J, Blaschko M (2011) Learning a category independent object detection cascade Proceedings of the IEEE international conference on computer vision, pp 1052–1059. IEEE

    Google Scholar 

  46. Rantalankila P, Kannala J, Rahtu E (2014) Generating object segmentation proposals using global and local search Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2417–2424. IEEE

    Google Scholar 

  47. Savelonas MA, Pratikakis I, Sfikas K (2015) An overview of partial 3d object retrieval methodologies. Multimedia Tools and Applications 74(24):11,783–11,808

    Article  Google Scholar 

  48. Sharir G, Tuytelaars T (2012) Video object proposals Proceedings of the IEEE computer society conference on computer vision and pattern recognition workshops, pp 9–14. IEEE

    Google Scholar 

  49. Sun D, Roth S, Black MJ (2014) A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int J Comput Vis 106(2):115–137

    Article  Google Scholar 

  50. Sunderhauf N, Shirazi S, Jacobson A, Dayoub F, Pepperell E, Upcroft B, Milford M (2015) Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. Proceedings of Robotics: Science and Systems XII

  51. Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

  52. Van de Sande KE, Uijlings JR, Gevers T, Smeulders AW (2011) Segmentation as selective search for object recognition Proceedings of the IEEE international conference on computer vision, pp 1879–1886. IEEE

  53. Van den Bergh M, Roig G, Boix X, Manen S, Van Gool L (2013) Online video seeds for temporal window objectness Proceedings of the IEEE international conference on computer vision, pp 377–384. IEEE

  54. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402. IEEE

    Google Scholar 

  55. Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  56. Xiao F, Lee YJ (2016) Track and segment: an iterative unsupervised approach for video object proposals Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE

    Google Scholar 

  57. Xu X, Ge L, Ren T, Wu G (2015) Adaptive integration of depth and color for objectness estimation Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE

    Google Scholar 

  58. Xu X, Geng W, Ju R, Yang Y, Ren T, Wu G (2014) Obsir: Object-based stereo image retrieval Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE

    Google Scholar 

  59. Yang G, Zhang Y, Yang J, Ji G, Dong Z, Wang S, Feng C, Wang Q (2015) Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimedia Tools and Applications 75(23):15,601–15,617

    Article  Google Scholar 

  60. Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions Proceedings of the IEEE conference on computer vision and pattern recognition, pp 628–635. IEEE

    Google Scholar 

  61. Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1):1–23

    Article  Google Scholar 

  62. Zhang H, Shang X, Yang W, Xu H, Luan H, Chua TS (2016) Online collaborative learning for open-vocabulary visual classifiers Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 2809–2817. ACM

    Google Scholar 

  63. Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval Proceedings of the ACM international conference on multimedia, pp 33–42. ACM

    Google Scholar 

  64. Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE

    Google Scholar 

  65. Zhang Y, Phillips P, Wang S, Ji G, Yang J, Wu J (2016) Fruit classification by biogeography-based optimization and feedforward neural network. Expert Systems 33(3):239–253

    Article  Google Scholar 

  66. Zhou Y, Ni B, Hong R, Wang M, Tian Q (2015) Interaction part mining: a mid-level approach for fine-grained action recognition Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3323–3331. IEEE

    Google Scholar 

  67. Zhu G, Porikli F, Li H (2016) Robust visual tracking with deep convolutional neural network based object proposals on pets Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 26–33. IEEE

    Google Scholar 

  68. Zitnick CL, Dollár P. (2014) Edge boxes: Locating object proposals from edges Proceedings of the european conference on computer vision, pp 391–405. Springer

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Science Foundation of China under Grant No.61321491, and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gangshan Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Geng, W., Zhang, C. & Wu, G. Adaptive video object proposals by a context-aware model. Multimed Tools Appl 77, 10589–10614 (2018). https://doi.org/10.1007/s11042-017-4561-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4561-9

Keywords