ABSTRACT
Videos have become the new preference comparing with images in recent years. However, during the recording of videos, the cameras are inevitably occluded by some objects or persons that pass through the cameras, which would highly increase the workload of video editors for searching out such occlusions. In this paper, for releasing the burden of video editors, a frame-level video occlusion detection method is proposed, which is a fundamental component of automatic video editing. The proposed method enhances the extraction of spatial-temporal information based on C3D yet only using around half amount of parameters, with an occlusion correction algorithm for correcting the prediction results. In addition, a novel loss function is proposed to better extract the characterization of occlusion and improve the detection performance. For performance evaluation, this paper builds a new large scale dataset, containing 1,000 video segments from seven different real-world scenarios, which could be available at: https://junhua-liao.github.io/Occlusion-Detection/. All occlusions in video segments are annotated frame by frame with bounding-boxes so that the dataset could be utilized in both frame-level occlusion detection and precise occlusion location. The experimental results illustrate that the proposed method could achieve good performance on video occlusion detection compared with the state-of-the-art approaches. To the best of our knowledge, this is the first study which focuses on occlusion detection for automatic video editing.
Supplemental Material
- Alper Ayvaci, Michalis Raptis, and Stefano Soatto. 2012. Sparse occlusion detection with optical flow. International journal of computer vision, Vol. 97, 3 (2012), 322--338.Google Scholar
- Sunghyun Cho, Jue Wang, and Seungyong Lee. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Transactions on Graphics (TOG), Vol. 31, 4 (2012), 1--9.Google ScholarDigital Library
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1. IEEE, 886--893.Google Scholar
- Yi Deng, Qiong Yang, Xueyin Lin, and Xiaoou Tang. 2007. Stereo correspondence with occlusion handling in a symmetric patch-based graph-cuts model. IEEE transactions on pattern analysis and machine intelligence, Vol. 29, 6 (2007), 1068--1079.Google Scholar
- Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2011. Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence, Vol. 34, 4 (2011), 743--761.Google Scholar
- Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2007. The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007).Google Scholar
- Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, Vol. 32, 9 (2009), 1627--1645.Google Scholar
- Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jimenez Cardenas, Thomas Brox, and Bernt Schiele. 2013. A unified video segmentation benchmark: Annotation, metrics and analysis. In Proceedings of the IEEE International Conference on Computer Vision. 3527--3534.Google ScholarDigital Library
- Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354--3361.Google ScholarDigital Library
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Zhibin Hong, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2014. Tracking using multilevel quantizations. In European Conference on Computer Vision. Springer, 155--171.Google ScholarCross Ref
- Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. 2019. Vrstc: Occlusion-free video person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7183--7192.Google ScholarCross Ref
- Edward Hsiao and Martial Hebert. 2014. Occlusion reasoning for object detectionunder arbitrary viewpoint. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 9 (2014), 1803--1815.Google Scholar
- Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Junhwa Hur and Stefan Roth. 2017. MirrorFlow: Exploiting symmetries in joint optical flow and occlusion estimation. In Proceedings of the IEEE International Conference on Computer Vision. 312--321.Google ScholarCross Ref
- Eddy Ilg, Tonmoy Saikia, Margret Keuper, and Thomas Brox. 2018. Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 614--630.Google ScholarDigital Library
- Neel Joshi, Wolf Kienzle, Mike Toelle, Matt Uyttendaele, and Michael F Cohen. 2015. Real-time hyperlapse creation via optimal frame selection. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 1--9.Google ScholarDigital Library
- Sing Bing Kang, Richard Szeliski, and Jinxiang Chai. 2001. Handling occlusions in dense multi-view stereo. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. IEEE, I--I.Google ScholarCross Ref
- Saad M Khan and Mubarak Shah. 2008. Tracking multiple occluding people by localizing on multiple scene planes. IEEE transactions on pattern analysis and machine intelligence, Vol. 31, 3 (2008), 505--519.Google Scholar
- Dieter Koller, Joseph Weber, and Jitendra Malik. 1994. Robust multiple car tracking with occlusion reasoning. In European Conference on Computer Vision. Springer, 189--196.Google ScholarCross Ref
- Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational video editing for dialogue-driven scenes. ACM Trans. Graph., Vol. 36, 4 (2017), 130--1.Google ScholarDigital Library
- Jinde Liu, Kaiqi Huang, and Tieniu Tan. 2015. Learning occlusion patterns using semantic phrases for object detection. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 686--690.Google ScholarCross Ref
- Markus Mathias, Rodrigo Benenson, Radu Timofte, and Luc Van Gool. 2013. Handling occlusions with franken-classifiers. In Proceedings of the IEEE International Conference on Computer Vision. 1505--1512.Google ScholarDigital Library
- Xue Mei, Haibin Ling, Yi Wu, Erik P Blasch, and Li Bai. 2013. Efficient minimum error bounded particle resampling L1 tracker with occlusion detection. IEEE Transactions on Image Processing, Vol. 22, 7 (2013), 2661--2675.Google ScholarCross Ref
- Wanli Ouyang and Xiaogang Wang. 2012. A discriminative deep model for pedestrian detection with occlusion handling. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3258--3265.Google ScholarDigital Library
- Wanli Ouyang and Xiaogang Wang. 2013. Joint deep learning for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision. 2056--2063.Google ScholarDigital Library
- Bojan Pepikj, Michael Stark, Peter Gehler, and Bernt Schiele. 2013. Occlusion patterns for object class detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3286--3293.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Jian Sun, Yin Li, Sing Bing Kang, and Heung-Yeung Shum. 2005. Symmetric stereo matching for occlusion handling. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 2. IEEE, 399--406.Google Scholar
- Patrik Sundberg, Thomas Brox, Michael Maire, Pablo Arbeláez, and Jitendra Malik. 2011. Occlusion boundary detection and figure/ground assignment from optical flow. In CVPR 2011. IEEE, 2233--2240.Google ScholarDigital Library
- Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.Google ScholarDigital Library
- Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6450--6459.Google ScholarCross Ref
- Anh Truong, Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2016. Quickcut: An interactive tool for editing narrated video. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 497--507.Google ScholarDigital Library
- Shuhei Tsuchida, Satoru Fukayama, and Masataka Goto. 2017. Automatic system for editing dance videos recorded using multiple cameras. In International Conference on Advances in Computer Entertainment. Springer, 671--688.Google Scholar
- Jialiang Wang and Todd Zickler. 2019. Local detection of stereo occlusion boundaries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3818--3827.Google ScholarCross Ref
- Miao Wang, Guo-Wei Yang, Shi-Min Hu, Shing-Tung Yau, and Ariel Shamir. 2019. Write-a-video: computational video montage from themed text. ACM Transactions on Graphics (TOG), Vol. 38, 6 (2019), 1--13.Google ScholarDigital Library
- Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.Google ScholarCross Ref
- Xiaoyu Wang, Tony X Han, and Shuicheng Yan. 2009. An HOG-LBP human detector with partial occlusion handling. In 2009 IEEE 12th international conference on computer vision. IEEE, 32--39.Google ScholarCross Ref
- Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, and Wei Xu. 2018b. Occlusion aware unsupervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4884--4893.Google ScholarCross Ref
- Zhengyang Wu, Fuxin Li, Rahul Sukthankar, and James M Rehg. 2015. Robust video segment proposals with painless occlusion handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4194--4203.Google ScholarCross Ref
- Wei-Qi Yan and Mohan S Kankanhalli. 2002. Detection and removal of lighting & shaking artifacts in home videos. In Proceedings of the tenth ACM international conference on Multimedia. 107--116.Google ScholarDigital Library
- Alper Yilmaz, Xin Li, and Mubarak Shah. 2004. Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Transactions on pattern analysis and machine intelligence, Vol. 26, 11 (2004), 1531--1536.Google ScholarDigital Library
- Congxuan Zhang, Zhen Chen, Mingrun Wang, Ming Li, and Shaofeng Jiang. 2017. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Processing, Vol. 26, 8 (2017), 4055--4067.Google ScholarCross Ref
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV). 637--653.Google ScholarCross Ref
- Chunluan Zhou and Junsong Yuan. 2017. Multi-label learning of part detectors for heavily occluded pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision. 3486--3495.Google ScholarCross Ref
- Chunluan Zhou and Junsong Yuan. 2018. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 135--151.Google ScholarDigital Library
- C Lawrence Zitnick and Takeo Kanade. 2000. A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on pattern analysis and machine intelligence, Vol. 22, 7 (2000), 675--684.Google ScholarDigital Library
Index Terms
Occlusion Detection for Automatic Video Editing
Recommendations
Robust tracking with adaptive appearance learning and occlusion detection
It is still challenging to design a robust and efficient tracking algorithm in complex scenes. We propose a new object tracking algorithm with adaptive appearance learning and occlusion detection in an efficient self-tuning particle filter framework. ...
Light field depth estimation using occlusion-aware consistency analysis
AbstractOcclusion modeling is critical for light field depth estimation, since occlusion destroys the photo-consistency assumption, which most depth estimation methods hold. Previous works always detect the occlusion points on the basis of Canny detector, ...
Occlusion detection and recovery in video object tracking based on adaptive particle filters
CCDC'09: Proceedings of the 21st annual international conference on Chinese Control and Decision ConferenceOcclusion detection and recovery is a challenging task in robust real-time tracking of non-rigid objects. Particle filtering has proven very successful for non-linear and non-Gaussian estimation problems. The paper presents a method for occlusion ...
Comments