Abstract
Tracking an object’s 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent’s uncertainty and a rendering-based mask propagation are effective reinitialization triggers.
We gratefully acknowledge the support of the EU-program EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, project TraceBot, the Austrian Science Fund (FWF), project No. J 4683, and Abyss Solutions Pty Ltd.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
BOP Toolkit. https://github.com/thodan/bop_toolkit
Aoki, Y., Goforth, H., Rangaprasad, A.S., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7156–7165 (2019)
Bauer, D., Patten, T., Vincze, M.: ReAgent: point cloud registration using imitation and reinforcement learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14586–14594 (2021)
Bauer, D., Patten, T., Vincze, M.: SporeAgent: reinforced scene-level plausibility for object pose refinement. IEEE Winter Conference on Applications of Computer Vision, pp. 654–662 (2022)
Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)
Calli, B., Walsman, A., Singh, A., Srinivasa, S.S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot. Autom. Mag. 22, 36–52 (2015)
Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9044–9053 (2021)
Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., Fox, D.: PoseRBPF: a Rao-Blackwellized particle filter for 6D object pose tracking. In: Robotics: Science and Systems (2019)
Ess, A., Schindler, K., Leibe, B., Gool, L.V.: Object detection and tracking for autonomous navigation in dynamic environments. Int. J. Robot. Res. 29(14), 1707–1725 (2010)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Issac, J., Wüthrich, M., Cifuentes, C.G., Bohg, J., Trimpe, S., Schaal, S.: Depth-based object tracking using a robust gaussian filter. IEEE International Conference on Robotics and Automation, pp. 608–615 (2016)
Kappler, D., et al.: Real-time perception meets reactive motion generation. IEEE Robot. Autom. Lett. 3(3), 1864–1871 (2018)
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. 128, 657–678 (2020)
Mao, J., Shi, S., Li, H.: 3D object detection for autonomous driving: a comprehensive survey. Int. J. Comput. Vis. 1573–1405 (2023)
Marturi, N., et al.: Dynamic grasp and trajectory planning for moving objects. Auton. Robots 43, 1241–1256 (2019)
Qi, C., Su, H., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85 (2017)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Stoiber, M., Sundermeyer, M., Triebel, R.: Iterative corresponding geometry: fusing region and depth for highly efficient 3D tracking of textureless objects. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6855–6865 (2022)
Tam, G.K.L., et al.: Registration of 3D point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans. Vis. Comput. Graph. 19, 1199–1217 (2013)
Tuscher, M., Hörz, J., Driess, D., Toussaint, M.: Deep 6-DoF tracking of unknown objects for reactive grasping. IEEE International Conference on Robotics and Automation, pp. 14185–14191 (2021)
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 16611–16621 (2021)
Wen, B., Mitash, C., Ren, B., Bekris, K.E.: se(3)-TrackNet: data-driven 6D pose tracking by calibrating image residuals in synthetic domains. IEEE International Conference on Intelligent Robots and Systems, pp. 10367–10373 (2020)
Wen, B., et al.: BundleSDF: neural 6-DoF tracking and 3D reconstruction of unknown objects. arXiv preprint arXiv:2303.14158 (2023)
Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.: Probabilistic object tracking using a range camera. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3195–3202 (2013)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Röhrl, K., Bauer, D., Patten, T., Vincze, M. (2023). TrackAgent: 6D Object Tracking via Reinforcement Learning. In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham. https://doi.org/10.1007/978-3-031-44137-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-44137-0_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44136-3
Online ISBN: 978-3-031-44137-0
eBook Packages: Computer ScienceComputer Science (R0)