research-article

EP-Net: More Efficient Pose Estimation Network with the Classification-based Key-points Detection

Authors:

Sheng ZhangAuthors Info & Claims

VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

Pages 100 - 108

https://doi.org/10.1145/3442705.3444923

Published: 21 March 2021 Publication History

Abstract

The performance of the 6D object pose algorithm is mainly constrained by difficult challenges such as texture, occlusion, symmetry, etc. However, recent works are more about superior single-target performance, but inefficient and less accurate for multi-object tasks. In this article, several previous work are referred to improve the efficiency of the algorithm to better serve complex scenarios, We use a two-stage pipeline to obtain the high-precision pose of the multi-object, in which the first stage is the key point detection, and the second stage solves PnP to obtain the 6DoF pose. We propose a simpler and more efficient classification-based key point detection algorithm for key points on the object surface. Experiments show that the proposed method outperforms the SOTA methods and robust on the LINEMOD, Occlusion-LINEMOD, and YCB-Video datasets. Especially, we outperform the SOTA methods on the challenging Occluded-LINEMOD dataset by a large margin. Our approach is more robust to occlusion and more efficient to multi-object pose estimation task. The code will be available at: https://github.com/CvHadesSun/E2P.

References

[1]

Gool L V Bay H, Tuytelaars T. Surf: Speeded up robust features. 2006.

[2]

Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, and Carsten Rother. Learning 6D Object Pose Estimation Using 3D Object Coordinates. Springer International Publishing, 2014.

[3]

Eric Brachmann, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, and Carsten Rother. Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In Computer Vision Pattern Recognition, 2016.

[4]

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, and Aaron M. Dollar. The ycb object and model set: Towards common benchmarks for manipulation research. In2015 International Conference on Advanced Robotics (ICAR), 2015.

[5]

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, and Ales Leonardis. G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. 2020.

[6]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. Multi-view 3d object detection network for autonomous driving. 2017.

[7]

Alessandro Mulloni Tom Drummond Daniel Wagner, Gerhard Reitmayr and Dieter Schmalstieg. Pose tracking from natural features on mobile phones. in international symposium on mixed and augmented reality. 2008.

[8]

Martin A. Fischler. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. 1987.

[9]

Yi Li;Gu Wang;Xiangyang Ji;Yu Xiang;Dieter Fox. Deepim: Deep iterative matching for 6d pose estimation. 2018.

[10]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE Conference on Computer Vision Pattern Recognition, 2012.

[11]

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Advances in Neural Information Processing Systems, 3:2672–2680, 2014.

Digital Library

[12]

Chunhui Gu and Xiaofeng Ren. Discriminative mixture-of-templates for viewpoint classification. In European Conference on Computer Vision, 2010.

[13]

Kaiming He, Gkioxari Georgia, Dollar Piotr, and Girshick Ross. Mask r-cnn. IEEE Transactions on Pattern Analysis Machine Intelligence, PP:1–1, 2017.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016.

[15]

Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm, Nassir Navab, Pascal Fua, and Vincent Lepetit. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):876–888, 2012.

Digital Library

[16]

Yinlin Hu, Joachim Hugonot, Pascal Fua, and Mathieu Salzmann. Segmentation-driven 6d object pose estimation. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[17]

D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis Machine Intelligence, 15(9):850–863, 1993.

Digital Library

[18]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In2017IEEE International Conference on Computer Vision (ICCV), 2017.

[19]

Vincent Lepetit and Pascal Fua. Monocular model-based 3d tracking of rigid objects: A survey. Foundations Trends in Computer Graphics Vision, 1(1):1–89, 2005.

Digital Library

[20]

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Epnp: An accurate o(n) solution to the pnp problem. International Journal of Computer Vision, 81(2):155–166, 2009.

Digital Library

[21]

Chi Li, Jin Bai, and Gregory D. Hager. A unified framework for multi-view multi-class object pose estimation. 2018.

Digital Library

[22]

Joerg Liebelt, Cordelia Schmid, and Klaus Schertler. Viewpoint-independent object class detection using 3d feature maps. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA, 2008.

[23]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Alexander C. Berg. Ssd: Single shot multibox detector. In European Conference on Computer Vision, 2016.

[24]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Alexander C. Berg. Ssd: Single shot multibox detector. In European Conference on Computer Vision, 2016.

[25]

David G. Lowe. Object recognition from local scaleinvariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999.

[26]

Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. Pose estimation for augmented reality: A hands-on survey. IEEE Transactions on Visualization Computer Graphics, 22(12):2633–2651, 2016.

Digital Library

[27]

Marder-Eppstein and Eitan. [acm press acm siggraph 2016 real-time live! - anaheim, california (2016.07.24-2016.07.28)] acm siggraph 2016 real-time live! on – siggraph 16 - project tango. 2016.

[28]

Markus Oberweger, Mahdi Rad, and Vincent Lepetit. Making deep heatmaps robust to partial occlusions for 3d object pose estimation. 2018.

Digital Library

[29]

Kiru Park, Timothy Patten, and Markus Vincze. Pix2pose:Pixel-wise coordinate regression of objects for 6d pose estimation. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2020.

[30]

Kiru Park, Timothy Patten, and Markus Vincze. Pix2pose:Pixel-wise coordinate regression of objects for 6d pose estimation. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2020.

[31]

Liu Y. Huang Q. Zhou X. Bao H. Peng, S. Pvnet: Pixel-wise voting network for 6dof pose estimation. 2019.

[32]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.Pointnet: Deep learning on point sets for 3d classification and segmentation. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[33]

Lepetit V. Rad, M. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. 2017.

[34]

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. 2018.

[35]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis Machine Intelligence, 39(6):1137–1149, 2017.

Digital Library

[36]

Fred Rothganger. 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision, 66(3):231–259, 2006.

Digital Library

[37]

S. Ilic S. Holzer G. Bradski K. Konolige S. Hinterstoisser, V. Lepetit and N. Navab. Model based training, detec- tion and pose estimation of texture-less 3d objects in heavily cluttered scenes. 2012.

[38]

Alvaro Collet;Manuel Martinez;Siddhartha S Srinivasa. 2011.

[39]

Alvaro Collet;Manuel Martinez;Siddhartha S Srinivasa. The moped framework: Object recognition and pose estimation for manipulation. 2011.

[40]

Hao Su, Charles R Qi, Yangyan Li, and Leonidas Guibas. Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. 2015.

[41]

Min Sun, Gary R. Bradski, Bing Xin Xu, and Silvio Savarese. Depth-encoded hough voting for joint object detection and shape recovery. 2010.

[42]

Martin Sundermeyer, Zoltan Csaba Marton, Maximilian Durner, Manuel Brucker, and Rudolph Triebel. Implicit 3d orientation learning for 6d object detection from rgb images. 2019.

[43]

Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, and Mohammad Norouzi. Discovery of latent 3d keypoints via end-to-end geometric reasoning. 2018.

[44]

Bugra Tekin, Sudipta N Sinha, and Pascal Fua. Real-time seamless single shot 6d object pose prediction. 2017.

[45]

Henning Tjaden, Ulrich Schwanecke, and Elmar Schomer. ¨ Real-time monocular pose estimation of 3d objects using temporally consistent local color histograms. In IEEE International Conference on Computer Vision, 2017.

[46]

Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. Deep object pose estimation for semantic robotic grasping of household objects. 2018.

[47]

Shubham Tulsiani and Jitendra Malik. Viewpoints and keypoints. 2015.

[48]

Joel Vidal, Chyi Yeu Lin, and Robert Marti. 6d pose estimation using an improved method based on point pair features. 2018.

[49]

Chen Wang, Danfei Xu, Yuke Zhu, Roberto Mart´ın-Mart´ın, Cewu Lu, Li Fei-Fei, and Silvio Savarese. Densefusion: 6d object pose estimation by iterative dense fusion. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[50]

Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. A-fast-rcnn: Hard positive generation via adversary for object detection. 2017.

[51]

Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. 2017.

[52]

Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Computer Vision Pattern Recognition, 2010.

[53]

Menglong Zhu, Konstantinos G. Derpanis, Yinfei Yang, Samarth Brahmbhatt, Mabel Zhang, Cody Phillips, Matthieu Lecce, and Kostas Daniilidis. Single image 3d object detection and pose estimation for grasping. In2014 IEEE International Conference on Robotics and Automation (ICRA), 2014.

Recommendations

Infinitesimal Plane-Based Pose Estimation

Estimating the pose of a plane given a set of point correspondences is a core problem in computer vision with many applications including Augmented Reality (AR), camera calibration and 3D scene reconstruction and interpretation. Despite much progress ...
Depth-assisted rectification for real-time object detection and pose estimation

RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and ...
Solving Generalized Pose Problem of Central and Non-central Cameras
Pattern Recognition and Computer Vision
Abstract
Recent applications in robotics, augmented reality, autonomous navigation, and self-driving involve cameras beyond the pinhole model, like fisheye cameras, multi-camera rigs, and other non-central cameras. We propose a unified method of solving ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

December 2020

108 pages

ISBN:9781450388931

DOI:10.1145/3442705

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

VSIP '20

VSIP '20: 2020 2nd International Conference on Video, Signal and Image Processing

December 4 - 6, 2020

Jakarta, Indonesia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
102
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten