Abstract
Most previous works focus on image object proposals while few on video object proposals. Besides, the existing explorations about video object proposals mainly concentrate on localizing the dominant object. In this paper, we aim at exploring a uniform framework for proposing multi-objects in videos no matter they are in the foreground or background. The method is derived from image object proposals, and makes best use of video characteristics. To achieve this task, we propose an adaptive context-aware model for video object proposals. First, spatial candidate windows are generated by the image method for acquiring the adequate bounding box samples. Temporal boxes are calculated by the motion based mapping. Considering the mapping loss, we define a box confidence coefficient contributing to keeping the proposal consistency and restraining the motion blur. The output proposal bounding boxes are ranked based on the scores calculated by the weighted scoring system. The proposed method is separately evaluated on the proposed multi-object dataset and the public dataset. The results compared with several state-of-the-arts show that our method has the most satisfactory overall performance for multi-object proposals in videos.


















Similar content being viewed by others
References
Alexe B, Deselaers T, Ferrari V (2010) What is an object? Proceedings of the IEEE conference on computer vision and pattern recognition, pp 73–80. IEEE
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11):2189–2202
Arbeláez P., Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping Proceedings of the IEEE conference on computer vision and pattern recognition, pp 328–335. IEEE
Bai T, Li YF, Zhou X (2015) Learning local appearances with sparse representation for robust and fast visual tracking. IEEE Transactions on Cybernetics 45(4):663–675
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories Proceedings of the european conference on computer vision, pp 282–295. Springer
Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1914–1923. IEEE
Chavali N, Agrawal H, Mahendru A, Batra D (2016) Object-proposal evaluation protocol is’ gameable’ Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE
Chen X, Ma H, Wang X, Zhao Z (2015) Improving object proposals with multi-thresholding straddling expansion Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2587–2595. IEEE
Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: Binarized normed gradients for objectness estimation at 300fps Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3286–3293. IEEE
Cheng Z, Li X, Shen J, Hauptmann AG (2016) Which information sources are more effective and reliable in video search Proceedings of the international conference on research on development in information retrieval, pp 1069–1072. ACM
Cheng Z, Shen J (2016) On very large scale test collection for landmark image search benchmarking. Signal Process 124:13–26
Cheng Z, Shen J, Miao H (2016) The effects of multiple query evidences on social image retrieval. Multimedia Systems 22(4):509–523
Choi MK, Wang Z, Lee HG, Lee SC (2016) A bag-of-regions representation for video classification. Multimedia Tools and Applications 75(5):2453–2472
Chu WT, Yu CH, Wang HH (2015) Optimized comics-based storytelling for temporal image sequences. IEEE Transactions on Multimedia 17(2):201–215
Endres I, Hoiem D (2010) Category independent object proposals Proceedings of the european conference on computer vision, pp 575–588. Springer
Endres I, Hoiem D (2014) Category-independent object proposals with diverse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2):222–234
Geng W, Li S, Ren T, Wu G (2016) Object proposals using svm-based integrated model Proceedings of the international joint conference on neural networks, pp 4154–4161. IEEE
Geng W, Wu G (2016) Context-aware video object proposals Proceedings of the IEEE conference on parallel and distributed systems, pp 1203–1206. IEEE
Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) Deepproposal: Hunting objects by cascading deep convolutional layers Proceedings of the IEEE international conference on computer vision, pp 2578–2586. IEEE
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587. IEEE
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(1):142–158
Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3090–3098. IEEE
Hayder Z, He X, Salzmann M (2016) Learning to co-generate object proposals with a deep structured network Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE
Hosang J, Benenson R, Dollár P., Schiele B (2016) What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence 38(4):814–830
Hu JF, Zheng WS, Ma L, Wang G, Lai J (2016) Real-time rgb-d activity prediction by soft regression Proceedings of the european conference on computer vision, pp 280–296. Springer
Hua Y, Alahari K, Schmid C (2015) Online object tracking with proposal selection Proceedings of the IEEE international conference on computer vision, pp 3092–3100. IEEE
Jain M, Van Gemert J, Jégou H, Bouthemy P, Snoek CG (2014) Action localization with tubelets from motion Proceedings of the IEEE conference on computer vision and pattern recognition, pp 740–747. IEEE
Jang WD, Lee C, Kim CS (2016) Primary object segmentation in videos via alternate convex optimization of foreground and background distributions Proceedings of the IEEE conference on computer vision and pattern recognition, pp 696–704. IEEE
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE
Kuo W, Hariharan B, Malik J (2015) Deepbox: Learning objectness with convolutional networks Proceedings of the IEEE international conference on computer vision, pp 2479–2487. IEEE
Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electr Eng 54:68–77
Liu J, Ren T, Bao BK, Bei J (2016) Depth-aware layered edge for object proposal Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Liu J, Ren T, Bei J (2016) Elastic edge boxes for object proposal on rgb-d images Proceedings of the international conference on multimedia modeling, pp 199–211. Springer
Liu J, Ren T, Wang Y, Zhong SH, Bei J, Chen S (2016) Object proposal on rgb-d images via elastic edge boxes. Neurocomputing 236:134–146
Liu Y, Mei T, Chen CW (2016) Automatic suggestion of presentation image for storytelling Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32(1):1–19
Manen S, Guillaumin M, Van Gool L (2013) Prime object proposals with randomized prim’s algorithm Proceedings of the IEEE international conference on computer vision, pp 2536–2543. IEEE
Meng J, Wang H, Yuan J, Tan YP (2016) From keyframes to key objects: video summarization by representative object proposal selection Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1039–1048. IEEE
Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6):1187–1200
Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals Proceedings of the european conference on computer vision, pp 737–752. Springer
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video Proceedings of the IEEE international conference on computer vision, pp 1777–1784. IEEE
Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2015) Fully connected object proposals for video segmentation Proceedings of the IEEE international conference on computer vision, pp 3227–3234. IEEE
Pont-Tuset J, Marques F (2016) Supervised evaluation of image segmentation and object proposal techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(7):1465–1478
Pont-Tuset J, Van Gool L (2015) Boosting object proposals: From pascal to coco Proceedings of the IEEE international conference on computer vision, pp 1546–1554. IEEE
Rahtu E, Kannala J, Blaschko M (2011) Learning a category independent object detection cascade Proceedings of the IEEE international conference on computer vision, pp 1052–1059. IEEE
Rantalankila P, Kannala J, Rahtu E (2014) Generating object segmentation proposals using global and local search Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2417–2424. IEEE
Savelonas MA, Pratikakis I, Sfikas K (2015) An overview of partial 3d object retrieval methodologies. Multimedia Tools and Applications 74(24):11,783–11,808
Sharir G, Tuytelaars T (2012) Video object proposals Proceedings of the IEEE computer society conference on computer vision and pattern recognition workshops, pp 9–14. IEEE
Sun D, Roth S, Black MJ (2014) A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int J Comput Vis 106(2):115–137
Sunderhauf N, Shirazi S, Jacobson A, Dayoub F, Pepperell E, Upcroft B, Milford M (2015) Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. Proceedings of Robotics: Science and Systems XII
Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Van de Sande KE, Uijlings JR, Gevers T, Smeulders AW (2011) Segmentation as selective search for object recognition Proceedings of the IEEE international conference on computer vision, pp 1879–1886. IEEE
Van den Bergh M, Roig G, Boix X, Manen S, Van Gool L (2013) Online video seeds for temporal window objectness Proceedings of the IEEE international conference on computer vision, pp 377–384. IEEE
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402. IEEE
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Xiao F, Lee YJ (2016) Track and segment: an iterative unsupervised approach for video object proposals Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE
Xu X, Ge L, Ren T, Wu G (2015) Adaptive integration of depth and color for objectness estimation Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Xu X, Geng W, Ju R, Yang Y, Ren T, Wu G (2014) Obsir: Object-based stereo image retrieval Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Yang G, Zhang Y, Yang J, Ji G, Dong Z, Wang S, Feng C, Wang Q (2015) Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimedia Tools and Applications 75(23):15,601–15,617
Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions Proceedings of the IEEE conference on computer vision and pattern recognition, pp 628–635. IEEE
Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1):1–23
Zhang H, Shang X, Yang W, Xu H, Luan H, Chua TS (2016) Online collaborative learning for open-vocabulary visual classifiers Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 2809–2817. ACM
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval Proceedings of the ACM international conference on multimedia, pp 33–42. ACM
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10. IEEE
Zhang Y, Phillips P, Wang S, Ji G, Yang J, Wu J (2016) Fruit classification by biogeography-based optimization and feedforward neural network. Expert Systems 33(3):239–253
Zhou Y, Ni B, Hong R, Wang M, Tian Q (2015) Interaction part mining: a mid-level approach for fine-grained action recognition Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3323–3331. IEEE
Zhu G, Porikli F, Li H (2016) Robust visual tracking with deep convolutional neural network based object proposals on pets Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 26–33. IEEE
Zitnick CL, Dollár P. (2014) Edge boxes: Locating object proposals from edges Proceedings of the european conference on computer vision, pp 391–405. Springer
Acknowledgements
This work is supported by the National Science Foundation of China under Grant No.61321491, and Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Geng, W., Zhang, C. & Wu, G. Adaptive video object proposals by a context-aware model. Multimed Tools Appl 77, 10589–10614 (2018). https://doi.org/10.1007/s11042-017-4561-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4561-9