TrackAgent: 6D Object Tracking via Reinforcement Learning

Röhrl, Konstantin; Bauer, Dominik; Patten, Timothy; Vincze, Markus

doi:10.1007/978-3-031-44137-0_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14253))

Included in the following conference series:

International Conference on Computer Vision Systems

891 Accesses

Abstract

Tracking an object’s 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent’s uncertainty and a rendering-based mask propagation are effective reinitialization triggers.

We gratefully acknowledge the support of the EU-program EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, project TraceBot, the Austrian Science Fund (FWF), project No. J 4683, and Abyss Solutions Pty Ltd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HDR-Net-Fusion: Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network

Article Open access 05 August 2021

EA-Repose: Efficient and Accurate Feature-Metric-Based 6D Object Pose Refinement via Deep Reinforcement Learning

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

References

BOP Toolkit. https://github.com/thodan/bop_toolkit
Aoki, Y., Goforth, H., Rangaprasad, A.S., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7156–7165 (2019)
Google Scholar
Bauer, D., Patten, T., Vincze, M.: ReAgent: point cloud registration using imitation and reinforcement learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14586–14594 (2021)
Google Scholar
Bauer, D., Patten, T., Vincze, M.: SporeAgent: reinforced scene-level plausibility for object pose refinement. IEEE Winter Conference on Applications of Computer Vision, pp. 654–662 (2022)
Google Scholar
Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)
Article Google Scholar
Calli, B., Walsman, A., Singh, A., Srinivasa, S.S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot. Autom. Mag. 22, 36–52 (2015)
Article Google Scholar
Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9044–9053 (2021)
Google Scholar
Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., Fox, D.: PoseRBPF: a Rao-Blackwellized particle filter for 6D object pose tracking. In: Robotics: Science and Systems (2019)
Google Scholar
Ess, A., Schindler, K., Leibe, B., Gool, L.V.: Object detection and tracking for autonomous navigation in dynamic environments. Int. J. Robot. Res. 29(14), 1707–1725 (2010)
Article Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Issac, J., Wüthrich, M., Cifuentes, C.G., Bohg, J., Trimpe, S., Schaal, S.: Depth-based object tracking using a robust gaussian filter. IEEE International Conference on Robotics and Automation, pp. 608–615 (2016)
Google Scholar
Kappler, D., et al.: Real-time perception meets reactive motion generation. IEEE Robot. Autom. Lett. 3(3), 1864–1871 (2018)
Article Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Chapter Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. 128, 657–678 (2020)
Article Google Scholar
Mao, J., Shi, S., Li, H.: 3D object detection for autonomous driving: a comprehensive survey. Int. J. Comput. Vis. 1573–1405 (2023)
Google Scholar
Marturi, N., et al.: Dynamic grasp and trajectory planning for moving objects. Auton. Robots 43, 1241–1256 (2019)
Article Google Scholar
Qi, C., Su, H., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85 (2017)
Google Scholar
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Stoiber, M., Sundermeyer, M., Triebel, R.: Iterative corresponding geometry: fusing region and depth for highly efficient 3D tracking of textureless objects. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6855–6865 (2022)
Google Scholar
Tam, G.K.L., et al.: Registration of 3D point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans. Vis. Comput. Graph. 19, 1199–1217 (2013)
Article Google Scholar
Tuscher, M., Hörz, J., Driess, D., Toussaint, M.: Deep 6-DoF tracking of unknown objects for reactive grasping. IEEE International Conference on Robotics and Automation, pp. 14185–14191 (2021)
Google Scholar
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 16611–16621 (2021)
Google Scholar
Wen, B., Mitash, C., Ren, B., Bekris, K.E.: se(3)-TrackNet: data-driven 6D pose tracking by calibrating image residuals in synthetic domains. IEEE International Conference on Intelligent Robots and Systems, pp. 10367–10373 (2020)
Google Scholar
Wen, B., et al.: BundleSDF: neural 6-DoF tracking and 3D reconstruction of unknown objects. arXiv preprint arXiv:2303.14158 (2023)
Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.: Probabilistic object tracking using a range camera. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3195–3202 (2013)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)
Google Scholar
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)

Download references

Author information

Authors and Affiliations

TU Wien, Vienna, Austria
Konstantin Röhrl, Dominik Bauer & Markus Vincze
Columbia University, New York, USA
Dominik Bauer
Abyss Solutions Pty Ltd, Sydney, Australia
Timothy Patten

Authors

Konstantin Röhrl
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Patten
View author publications
You can also search for this author in PubMed Google Scholar
Markus Vincze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominik Bauer .

Editor information

Editors and Affiliations

UC San Diego, La Jolla, CA, USA
Henrik I. Christensen
Queensland University of Technology, Brisbane, QLD, Australia
Peter Corke
KU Leuven, Leuven, Belgium
Renaud Detry
TU Wien, Vienna, Austria
Jean-Baptiste Weibel
TU Wien, Vienna, Austria
Markus Vincze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Röhrl, K., Bauer, D., Patten, T., Vincze, M. (2023). TrackAgent: 6D Object Tracking via Reinforcement Learning. In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham. https://doi.org/10.1007/978-3-031-44137-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-44137-0_27
Published: 21 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44136-3
Online ISBN: 978-3-031-44137-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TrackAgent: 6D Object Tracking via Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HDR-Net-Fusion: Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network

EA-Repose: Efficient and Accurate Feature-Metric-Based 6D Object Pose Refinement via Deep Reinforcement Learning

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TrackAgent: 6D Object Tracking via Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HDR-Net-Fusion: Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network

EA-Repose: Efficient and Accurate Feature-Metric-Based 6D Object Pose Refinement via Deep Reinforcement Learning

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation