research-article

Incremental multi-view object detection from a moving camera

Authors:

Asako KanezakiAuthors Info & Claims

MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Article No.: 4, Pages 1 - 7

https://doi.org/10.1145/3444685.3446257

Published: 03 May 2021 Publication History

Abstract

Object detection in a single image is a challenging problem due to clutters, occlusions, and a large variety of viewing locations. This task can benefit from integrating multi-frame information captured by a moving camera. In this paper, we propose a method to increment object detection scores extracted from multiple frames captured from different viewpoints. For each frame, we run an efficient end-to-end object detector that outputs object bounding boxes, each of which is associated with the scores of categories and poses. The scores of detected objects are then stored in grid locations in 3D space. After observing multiple frames, the object scores stored in each grid location are integrated based on the best object pose hypothesis. This strategy requires the consistency of object categories and poses among multiple frames, and thus it significantly suppresses miss detections. The performance of the proposed method is evaluated on our newly created multi-class object dataset captured in robot simulation and real environments, as well as on a public benchmark dataset.

References

[1]

[n.d.]. The Princeton ModelNet. http://modelnet.cs.princeton.edu/.

[2]

Amr Bakry and Ahmed Elgammal. 2014. Untangling object-view manifold for multiview recognition and pose estimation. In Proceedings of European Conference on Computer Vision (ECCV).

[3]

Sid Yingze Bao, Mohit Bagra, Yu-Wei Chao, and Silvio Savarese. 2012. Semantic Structure from Motion with Points, Regions, and Objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]

Sid Yingze Bao and Silvio Savarese. 2011. Semantic Structure from Motion. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[5]

Sid Yingze Bao, Yu Xiang, and Silvio Savarese. 2012. Object Co-detection. In Proceedings of European Conference on Computer Vision (ECCV).

[6]

Gedas Bertasius, Lorenzo Torresani, and Jianbo Shi. 2018. Object Detection in Video with Spatiotemporal Sampling Networks. In Proceedings of European Conference on Computer Vision (ECCV).

[7]

Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3d object detection for autonomous driving. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]

Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2015. 3d object proposals for accurate object class detection. In Proceedings of Advances in Neural Information Processing Systems (NIPS).

[9]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Angela Dai and Matthias Nießner. 2018. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In Proceedings of European Conference on Computer Vision (ECCV).

[11]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2020. Rescaling Egocentric Vision. CoRR abs/2006.13256 (2020).

[12]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2018. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In Proceedings of European Conference on Computer Vision (ECCV).

[13]

Gilad Divon and Ayellet Tal. 2018. Viewpoint Estimation---Insights & Model. In Proceedings of European Conference on Computer Vision (ECCV).

[14]

Debidatta Dwibedi, Ishan Misra, and Martial Hebert. 2017. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection. In Proceedings of International Conference on Computer Vision (ICCV).

[15]

Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, and Ahmed Elgammal. 2016. A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation. In Proceedings of International Conference on Machine Learning (ICML).

[16]

N. Fioraio and L. Di Stefano. 2013. Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]

D. P. Frost, O. Kähler, and D. W. Murray. 2016. Object-aware bundle adjustment for correcting monocular scale drift. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).

[18]

Dorian Gálvez-López, Marta Salas, Juan D. Tardás, and J.M.M. Montiel. 2016. Real-time monocular object SLAM. Robotics and Autonomous Systems 75 (2016), 435 -- 449.

Digital Library

[19]

P. Gay, V. Bansal, C. Rubino, and A. D. Bue. 2017. Probabilistic Structure from Motion with Objects (PSfMO). In Proceedings of International Conference on Computer Vision (ICCV).

[20]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[21]

Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, and Carsten Rother. 2018. iPose: instance-aware 6D pose estimation of partly occluded objects. In Proceedings of Asian Conference on Computer Vision (ACCV).

[22]

Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. 2018. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of International Conference on Computer Vision (ICCV).

[24]

Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L Waslander. 2018. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]

Shyam Sunder Kumar, Min Sun, and Silvio Savarese. 2012. Mobile object detection through client-server based vote transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]

Kevin Lai, Liefeng Bo, and Dieter Fox. 2014. Unsupervised feature learning for 3D scene labeling. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).

[27]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2011. A Large-Scale Hierarchical Multi-View RGB-D Object Dataset. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).

[28]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2011. A Scalable Tree-based Approach for Joint Object and Pose Recognition. In Proceedings of AAAI Conference on Artificial Intelligence.

[29]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2012. Detection-based Object Labeling in 3D Scenes. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA). 1330--1337.

[30]

Chi Li, Jin Bai, and Gregory D. Hager. 2018. A Unified Framework for Multi-view Multi-class Object Pose Estimation. In Proceedings of European Conference on Computer Vision (ECCV).

[31]

John McCormac, Ronald Clark, Michael Bloesch, Andrew J. Davison, and Stefan Leutenegger. 2018. Fusion++: Volumetric Object-Level SLAM. In Proceedings of International Conference on 3D Vision (3DV).

[32]

Raúl Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics 33, 5 (2017), 1255--1262.

Digital Library

[33]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Transactions on graphics (TOG) 22, 3 (2003), 313--318.

[34]

Sudeep Pillai and John Leonard. 2015. Monocular SLAM Supported Object Recognition. In Proceedings of Robotics: Science and Systems (RSS).

[35]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. CoRR abs/1804.02767 (2018). arXiv:1804.02767

[36]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS).

[37]

R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davison. 2013. SLAM++ : Simultaneous Localisation and Mapping at the Level of Objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]

Martin Simony, Stefan Milzy, Karl Amendey, and Horst-Michael Gross. 2018. Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In Proceedings of European Conference on Computer Vision (ECCV).

[39]

Shuran Song and Jianxiong Xiao. 2016. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik G. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of International Conference on Computer Vision (ICCV).

[41]

Hao Su, Charles R. Qi, Yangyan Li, and Leonidas J. Guibas. 2015. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views. In Proceedings of International Conference on Computer Vision (ICCV).

[42]

N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford, and I. Reid. 2017. Meaningful maps with object-oriented semantic mapping. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, and Rudolph Triebel. 2018. Implicit 3d orientation learning for 6d object detection from rgb images. In Proceedings of European Conference on Computer Vision (ECCV).

[44]

Alexander Thomas, Vittorio Ferrar, Bastian Leibe, Tinne Tuytelaars, Bernt Schiel, and Luc Van Gool. 2006. Towards multi-view object class detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[45]

Yu Xiang, Wongun Choi, Yuanqing Lin, and Silvio Savarese. 2015. Data-driven 3d voxel patterns for object category recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]

Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2018. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. In Proceedings of Robotics: Science and Systems (RSS).

[47]

Takashi Yamamoto, Koji Terada, Akiyoshi Ochiai, Fuminori Saito, Yoshiaki Asahara, and Kazuto Murase. 2019. Development of Human Support Robot as the research platform of a domestic mobile manipulator. ROBOMECH Journal 6, 1 (2019).

[48]

Mehran Yazdi and Thierry Bouwmans. 2018. New trends on moving object detection in video images captured by a moving camera: A survey. Computer Science Review 28 (2018), 157--177.

[49]

Haopeng Zhang, Tarek El-Gaaly, Ahmed M Elgammal, and Zhiguo Jiang. 2013. Joint Object and Pose Recognition Using Homeomorphic Manifold Analysis. In Proceedings of AAAI Conference on Artificial Intelligence.

Cited By

Yu HCong YSun GHou DLiu YDong J(2024)Open-Ended Online Learning for Autonomous Visual PerceptionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324244835:8(10178-10198)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242448
Zhong JQi ZZhang WHuang Q(2023)Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV VideosProceedings of the 5th ACM International Conference on Multimedia in Asia10.1145/3595916.3626385(1-7)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1145/3595916.3626385
Yoon HKim JJeong J(2022)Classification of the Sidewalk Condition Using Self-Supervised Transfer Learning for Wheelchair Safety DrivingSensors10.3390/s2201038022:1(380)Online publication date: 5-Jan-2022
https://doi.org/10.3390/s22010380
Show More Cited By

Index Terms

Incremental multi-view object detection from a moving camera
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

Moving object detection and tracking from video captured by moving camera

Proposed method has good performance for a moving camera without additional sensors.Proposed method works well for tracking overlapping objects with scale changes.Proposed method outperforms the-state-of-art methods. This paper presents an effective ...
Advanced Moving Camera Object Detection
New Trends in Image Analysis and Processing – ICIAP 2019
Abstract
Assuming a moving camera, detection of moving objects is a challenging task. This is mainly due to the difficulties to distinguish between objects motion and background motion, introduced by the camera. The proposed real time system, based on ...
Robust object tracking via multi-cue fusion

A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in Asia

March 2021

512 pages

ISBN:9781450383080

DOI:10.1145/3444685

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Jingdong Wang
Microsoft Research
,
Qi Tian
Huawei Noah's Ark
,
Program Chairs:
Cathal Gurrin
Dublin City University
,
Jia Jia
Tsinghua University
,
Hanwang Zhang
Nanyang Technological University
,
Qianru Sun
Singapore Management University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MMAsia '20

Sponsor:

SIGMM

MMAsia '20: ACM Multimedia Asia

March 7, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu HCong YSun GHou DLiu YDong J(2024)Open-Ended Online Learning for Autonomous Visual PerceptionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324244835:8(10178-10198)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242448
Zhong JQi ZZhang WHuang Q(2023)Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV VideosProceedings of the 5th ACM International Conference on Multimedia in Asia10.1145/3595916.3626385(1-7)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1145/3595916.3626385
Yoon HKim JJeong J(2022)Classification of the Sidewalk Condition Using Self-Supervised Transfer Learning for Wheelchair Safety DrivingSensors10.3390/s2201038022:1(380)Online publication date: 5-Jan-2022
https://doi.org/10.3390/s22010380
Hu JWang TLiu DZhu S(2022)Detecting More Objects Indoors: a Depth-guided Detector for Mobile Robots2022 IEEE International Conference on Robotics and Biomimetics (ROBIO)10.1109/ROBIO55434.2022.10011811(904-908)Online publication date: 5-Dec-2022
https://doi.org/10.1109/ROBIO55434.2022.10011811
Hu JWang TZhu S(2022)Multi-view aggregation for real-time accurate object detection of a moving cameraJournal of Real-Time Image Processing10.1007/s11554-022-01253-919:6(1169-1179)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11554-022-01253-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten