skip to main content
10.1145/3394171.3413725acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Occlusion Detection for Automatic Video Editing

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

Videos have become the new preference comparing with images in recent years. However, during the recording of videos, the cameras are inevitably occluded by some objects or persons that pass through the cameras, which would highly increase the workload of video editors for searching out such occlusions. In this paper, for releasing the burden of video editors, a frame-level video occlusion detection method is proposed, which is a fundamental component of automatic video editing. The proposed method enhances the extraction of spatial-temporal information based on C3D yet only using around half amount of parameters, with an occlusion correction algorithm for correcting the prediction results. In addition, a novel loss function is proposed to better extract the characterization of occlusion and improve the detection performance. For performance evaluation, this paper builds a new large scale dataset, containing 1,000 video segments from seven different real-world scenarios, which could be available at: https://junhua-liao.github.io/Occlusion-Detection/. All occlusions in video segments are annotated frame by frame with bounding-boxes so that the dataset could be utilized in both frame-level occlusion detection and precise occlusion location. The experimental results illustrate that the proposed method could achieve good performance on video occlusion detection compared with the state-of-the-art approaches. To the best of our knowledge, this is the first study which focuses on occlusion detection for automatic video editing.

Skip Supplemental Material Section

Supplemental Material

3394171.3413725.mp4

In this paper, we proposed a novel idea of occlusion detection for automatic video editing. A C3D-based method was implemented to detect frames with occlusion in videos. To verify the effectiveness of the proposed method, we introduce a new large scale dataset. The experimental results illustrate that our method can effectively recognize occlusion.

References

  1. Alper Ayvaci, Michalis Raptis, and Stefano Soatto. 2012. Sparse occlusion detection with optical flow. International journal of computer vision, Vol. 97, 3 (2012), 322--338.Google ScholarGoogle Scholar
  2. Sunghyun Cho, Jue Wang, and Seungyong Lee. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Transactions on Graphics (TOG), Vol. 31, 4 (2012), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1. IEEE, 886--893.Google ScholarGoogle Scholar
  4. Yi Deng, Qiong Yang, Xueyin Lin, and Xiaoou Tang. 2007. Stereo correspondence with occlusion handling in a symmetric patch-based graph-cuts model. IEEE transactions on pattern analysis and machine intelligence, Vol. 29, 6 (2007), 1068--1079.Google ScholarGoogle Scholar
  5. Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2011. Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence, Vol. 34, 4 (2011), 743--761.Google ScholarGoogle Scholar
  6. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2007. The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007).Google ScholarGoogle Scholar
  7. Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, Vol. 32, 9 (2009), 1627--1645.Google ScholarGoogle Scholar
  8. Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jimenez Cardenas, Thomas Brox, and Bernt Schiele. 2013. A unified video segmentation benchmark: Annotation, metrics and analysis. In Proceedings of the IEEE International Conference on Computer Vision. 3527--3534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354--3361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Zhibin Hong, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2014. Tracking using multilevel quantizations. In European Conference on Computer Vision. Springer, 155--171.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. 2019. Vrstc: Occlusion-free video person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7183--7192.Google ScholarGoogle ScholarCross RefCross Ref
  14. Edward Hsiao and Martial Hebert. 2014. Occlusion reasoning for object detectionunder arbitrary viewpoint. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 9 (2014), 1803--1815.Google ScholarGoogle Scholar
  15. Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  16. Junhwa Hur and Stefan Roth. 2017. MirrorFlow: Exploiting symmetries in joint optical flow and occlusion estimation. In Proceedings of the IEEE International Conference on Computer Vision. 312--321.Google ScholarGoogle ScholarCross RefCross Ref
  17. Eddy Ilg, Tonmoy Saikia, Margret Keuper, and Thomas Brox. 2018. Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 614--630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Neel Joshi, Wolf Kienzle, Mike Toelle, Matt Uyttendaele, and Michael F Cohen. 2015. Real-time hyperlapse creation via optimal frame selection. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sing Bing Kang, Richard Szeliski, and Jinxiang Chai. 2001. Handling occlusions in dense multi-view stereo. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. IEEE, I--I.Google ScholarGoogle ScholarCross RefCross Ref
  20. Saad M Khan and Mubarak Shah. 2008. Tracking multiple occluding people by localizing on multiple scene planes. IEEE transactions on pattern analysis and machine intelligence, Vol. 31, 3 (2008), 505--519.Google ScholarGoogle Scholar
  21. Dieter Koller, Joseph Weber, and Jitendra Malik. 1994. Robust multiple car tracking with occlusion reasoning. In European Conference on Computer Vision. Springer, 189--196.Google ScholarGoogle ScholarCross RefCross Ref
  22. Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational video editing for dialogue-driven scenes. ACM Trans. Graph., Vol. 36, 4 (2017), 130--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jinde Liu, Kaiqi Huang, and Tieniu Tan. 2015. Learning occlusion patterns using semantic phrases for object detection. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 686--690.Google ScholarGoogle ScholarCross RefCross Ref
  24. Markus Mathias, Rodrigo Benenson, Radu Timofte, and Luc Van Gool. 2013. Handling occlusions with franken-classifiers. In Proceedings of the IEEE International Conference on Computer Vision. 1505--1512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xue Mei, Haibin Ling, Yi Wu, Erik P Blasch, and Li Bai. 2013. Efficient minimum error bounded particle resampling L1 tracker with occlusion detection. IEEE Transactions on Image Processing, Vol. 22, 7 (2013), 2661--2675.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wanli Ouyang and Xiaogang Wang. 2012. A discriminative deep model for pedestrian detection with occlusion handling. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3258--3265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wanli Ouyang and Xiaogang Wang. 2013. Joint deep learning for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision. 2056--2063.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bojan Pepikj, Michael Stark, Peter Gehler, and Bernt Schiele. 2013. Occlusion patterns for object class detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3286--3293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  30. Jian Sun, Yin Li, Sing Bing Kang, and Heung-Yeung Shum. 2005. Symmetric stereo matching for occlusion handling. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 2. IEEE, 399--406.Google ScholarGoogle Scholar
  31. Patrik Sundberg, Thomas Brox, Michael Maire, Pablo Arbeláez, and Jitendra Malik. 2011. Occlusion boundary detection and figure/ground assignment from optical flow. In CVPR 2011. IEEE, 2233--2240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6450--6459.Google ScholarGoogle ScholarCross RefCross Ref
  34. Anh Truong, Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2016. Quickcut: An interactive tool for editing narrated video. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 497--507.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shuhei Tsuchida, Satoru Fukayama, and Masataka Goto. 2017. Automatic system for editing dance videos recorded using multiple cameras. In International Conference on Advances in Computer Entertainment. Springer, 671--688.Google ScholarGoogle Scholar
  36. Jialiang Wang and Todd Zickler. 2019. Local detection of stereo occlusion boundaries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3818--3827.Google ScholarGoogle ScholarCross RefCross Ref
  37. Miao Wang, Guo-Wei Yang, Shi-Min Hu, Shing-Tung Yau, and Ariel Shamir. 2019. Write-a-video: computational video montage from themed text. ACM Transactions on Graphics (TOG), Vol. 38, 6 (2019), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xiaoyu Wang, Tony X Han, and Shuicheng Yan. 2009. An HOG-LBP human detector with partial occlusion handling. In 2009 IEEE 12th international conference on computer vision. IEEE, 32--39.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, and Wei Xu. 2018b. Occlusion aware unsupervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4884--4893.Google ScholarGoogle ScholarCross RefCross Ref
  41. Zhengyang Wu, Fuxin Li, Rahul Sukthankar, and James M Rehg. 2015. Robust video segment proposals with painless occlusion handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4194--4203.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wei-Qi Yan and Mohan S Kankanhalli. 2002. Detection and removal of lighting & shaking artifacts in home videos. In Proceedings of the tenth ACM international conference on Multimedia. 107--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Alper Yilmaz, Xin Li, and Mubarak Shah. 2004. Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Transactions on pattern analysis and machine intelligence, Vol. 26, 11 (2004), 1531--1536.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Congxuan Zhang, Zhen Chen, Mingrun Wang, Ming Li, and Shaofeng Jiang. 2017. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Processing, Vol. 26, 8 (2017), 4055--4067.Google ScholarGoogle ScholarCross RefCross Ref
  45. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV). 637--653.Google ScholarGoogle ScholarCross RefCross Ref
  46. Chunluan Zhou and Junsong Yuan. 2017. Multi-label learning of part detectors for heavily occluded pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision. 3486--3495.Google ScholarGoogle ScholarCross RefCross Ref
  47. Chunluan Zhou and Junsong Yuan. 2018. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 135--151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C Lawrence Zitnick and Takeo Kanade. 2000. A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on pattern analysis and machine intelligence, Vol. 22, 7 (2000), 675--684.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Occlusion Detection for Automatic Video Editing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '20: Proceedings of the 28th ACM International Conference on Multimedia
        October 2020
        4889 pages
        ISBN:9781450379885
        DOI:10.1145/3394171

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader