Skip to main content

TrackAgent: 6D Object Tracking via Reinforcement Learning

  • Conference paper
  • First Online:
Computer Vision Systems (ICVS 2023)

Abstract

Tracking an object’s 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent’s uncertainty and a rendering-based mask propagation are effective reinitialization triggers.

We gratefully acknowledge the support of the EU-program EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, project TraceBot, the Austrian Science Fund (FWF), project No. J 4683, and Abyss Solutions Pty Ltd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. BOP Toolkit. https://github.com/thodan/bop_toolkit

  2. Aoki, Y., Goforth, H., Rangaprasad, A.S., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7156–7165 (2019)

    Google Scholar 

  3. Bauer, D., Patten, T., Vincze, M.: ReAgent: point cloud registration using imitation and reinforcement learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14586–14594 (2021)

    Google Scholar 

  4. Bauer, D., Patten, T., Vincze, M.: SporeAgent: reinforced scene-level plausibility for object pose refinement. IEEE Winter Conference on Applications of Computer Vision, pp. 654–662 (2022)

    Google Scholar 

  5. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)

    Article  Google Scholar 

  6. Calli, B., Walsman, A., Singh, A., Srinivasa, S.S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot. Autom. Mag. 22, 36–52 (2015)

    Article  Google Scholar 

  7. Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9044–9053 (2021)

    Google Scholar 

  8. Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., Fox, D.: PoseRBPF: a Rao-Blackwellized particle filter for 6D object pose tracking. In: Robotics: Science and Systems (2019)

    Google Scholar 

  9. Ess, A., Schindler, K., Leibe, B., Gool, L.V.: Object detection and tracking for autonomous navigation in dynamic environments. Int. J. Robot. Res. 29(14), 1707–1725 (2010)

    Article  Google Scholar 

  10. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  11. Issac, J., Wüthrich, M., Cifuentes, C.G., Bohg, J., Trimpe, S., Schaal, S.: Depth-based object tracking using a robust gaussian filter. IEEE International Conference on Robotics and Automation, pp. 608–615 (2016)

    Google Scholar 

  12. Kappler, D., et al.: Real-time perception meets reactive motion generation. IEEE Robot. Autom. Lett. 3(3), 1864–1871 (2018)

    Article  Google Scholar 

  13. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34

    Chapter  Google Scholar 

  14. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. 128, 657–678 (2020)

    Article  Google Scholar 

  15. Mao, J., Shi, S., Li, H.: 3D object detection for autonomous driving: a comprehensive survey. Int. J. Comput. Vis. 1573–1405 (2023)

    Google Scholar 

  16. Marturi, N., et al.: Dynamic grasp and trajectory planning for moving objects. Auton. Robots 43, 1241–1256 (2019)

    Article  Google Scholar 

  17. Qi, C., Su, H., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85 (2017)

    Google Scholar 

  18. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001)

    Google Scholar 

  19. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  20. Stoiber, M., Sundermeyer, M., Triebel, R.: Iterative corresponding geometry: fusing region and depth for highly efficient 3D tracking of textureless objects. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6855–6865 (2022)

    Google Scholar 

  21. Tam, G.K.L., et al.: Registration of 3D point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans. Vis. Comput. Graph. 19, 1199–1217 (2013)

    Article  Google Scholar 

  22. Tuscher, M., Hörz, J., Driess, D., Toussaint, M.: Deep 6-DoF tracking of unknown objects for reactive grasping. IEEE International Conference on Robotics and Automation, pp. 14185–14191 (2021)

    Google Scholar 

  23. Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 16611–16621 (2021)

    Google Scholar 

  24. Wen, B., Mitash, C., Ren, B., Bekris, K.E.: se(3)-TrackNet: data-driven 6D pose tracking by calibrating image residuals in synthetic domains. IEEE International Conference on Intelligent Robots and Systems, pp. 10367–10373 (2020)

    Google Scholar 

  25. Wen, B., et al.: BundleSDF: neural 6-DoF tracking and 3D reconstruction of unknown objects. arXiv preprint arXiv:2303.14158 (2023)

  26. Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.: Probabilistic object tracking using a range camera. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3195–3202 (2013)

    Google Scholar 

  27. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)

    Google Scholar 

  28. Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Bauer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Röhrl, K., Bauer, D., Patten, T., Vincze, M. (2023). TrackAgent: 6D Object Tracking via Reinforcement Learning. In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham. https://doi.org/10.1007/978-3-031-44137-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44137-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44136-3

  • Online ISBN: 978-3-031-44137-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics