Iterative residual tuning for system identification and sim-to-real robot learning

Allevato, Adam David; Schaertl Short, Elaine; Pryor, Mitch; Thomaz, Andrea L.

doi:10.1007/s10514-020-09925-w

Iterative residual tuning for system identification and sim-to-real robot learning

Published: 27 June 2020

Volume 44, pages 1167–1182, (2020)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Adam David Allevato ORCID: orcid.org/0000-0001-8795-1498¹,
Elaine Schaertl Short¹^nAff2,
Mitch Pryor¹ &
…
Andrea L. Thomaz¹

966 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

Robots are increasingly learning complex skills in simulation, increasing the need for realistic simulation environments. Existing techniques for approximating real-world physics with a simulation require extensive observation data and/or thousands of simulation samples. This paper presents iterative residual tuning (IRT), a deep learning system identification technique that modifies a simulator’s parameters to better match reality using minimal real-world observations. IRT learns to estimate the parameter difference between two parameterized models, allowing repeated iterations to converge on the true parameters similarly to gradient descent. In this paper, we develop and analyze IRT in depth, including its similarities and differences with gradient descent. Our IRT implementation, TuneNet, is pre-trained via supervised learning over an auto-generated simulated dataset. We show that TuneNet can perform rapid, efficient system identification even when the true parameter values lie well outside those in the network’s training data, and can also learn real-world parameter values from visual data. We apply TuneNet to a sim-to-real task transfer experiment, allowing a robot to perform a dynamic manipulation task with a new object after a single observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Article 14 December 2022

Hao Ju, Rongshun Juan, … Guangliang Li

Closing the Reality Gap with Unsupervised Sim-to-Real Image Translation

Bridging the Reality Gap via Progressive Bayesian Optimisation

Notes

https://pybullet.org.
Our reference implementation of CMA-ES would not run for the case of a single tunable parameter and so is not compared to the Kinova and Cartpole environments.
https://github.com/CMA-ES/pycma.
http://github.com/SheffieldML/GPy.

References

Ajay, A., Wu, J., Fazeli, N., Bauza, M., Kaelbling, L. P., Tenenbaum, J. B., et al. (2018). Augmenting physical simulators with stochastic neural networks: Case study of planar pushing and bouncing. In International conference on intelligent robots and systems (IROS).
Allevato, A., Schaertl Short, E., Pryor, M., & Thomaz, A. L. (2019). TuneNet: One-shot residual tuning for system identification and sim-to-real robot task transfer. CoRR arXiv:1907.11200v3.
Ayusawa, K., Venture, G., & Nakamura, Y. (2013). Identifiability and identification of inertial parameters using the underactuated base-link dynamics for legged multibody systems. The International Journal of Robotics Research, 33, 446–468. https://doi.org/10.1177/0278364913495932.
Article Google Scholar
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT (pp. 177–186). Berlin: Springer.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. CoRR arXiv:1606.01540v1.
Chang, M. B., Ullman, T., Torralba, A., & Tenenbaum, J. B. (2016). A compositional object-based approach to learning physical dynamics. Preprint arXiv:1612.00341.
Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N., et al. (2018). Closing the sim-to-real loop: Adapting simulation randomization with real world experience. CoRR.
de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., & Kolter, J. Z. (2018). End-to-end differentiable physics for learning and control. In Advances in neural information processing systems (Vol. 31, pp. 7178–7189). Montreal. https://papers.nips.cc/paper/7948-end-to-end-differentiable-physics-for-learning-and-control.
Dietrich, O. E. (2015). USA racquetball official rules of racquetball. USA Racquetball.
Fazeli, N., Tedrake, R., & Rodriguez, A. (2015). Identifiability analysis of planar rigid-body frictional contact. In International symposium on robotics research (ISRR). Berlin: Springer.
Gautier, M., & Khalil, W. (1988). On the identification of the inertial parameters of robots. In Proceedings of the 27th IEEE conference on decision and control (pp. 2264–2269). New York: IEEE. https://doi.org/10.1109/CDC.1988.194738.
Golemo, F., Taïga, A. A., Oudeyer, P. Y., & Courville, A. (2018). Sim-to-real transfer with neural-augmented robot simulation. In 2nd conference on robot learning (CoRL18) (Vol. 87, pp. 817–828).
Hanna, J. P., & Stone, P. (2017). Grounded action transformation for robot learning in simulation. In Proceedings of the 31st AAAI conference on artificial intelligence (AAAI-17) (pp. 3834–3840).
Hanna, J. P., Thomas, P. S., Stone, P., & Niekum, S. (2017). Data-efficient policy evaluation through behavior policy search. In D. Precup, & Y. W. Teh (Eds.), Proceedings of the 34th international conference on machine learning, PMLR, Sydney, Australia, proceedings of machine learning research (Vol. 70, pp. 1394–1403).
Hansen, N. (2016). The CMA evolution strategy: A tutorial. CoRR arXiv:1604.00772.
Heiden, E., Millard, D., Zhang, H., & Sukhatme, G. S. (2019). Interactive differentiable simulation. CoRR arXiv:1905.10706.
Hennig, P., & Schuler, C. J. (2012). Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(June), 1809–1837.
MathSciNet MATH Google Scholar
Hoffman, J., Tzeng, E., Park, T., Zhu, J. Y., Isola, P., Saenko, K., et al. (2017). CyCADA: Cycle-consistent adversarial domain adaptation. CoRR arXiv:1711.03213.
Hollerbach, J., Khalil, W., & Gautier, M. (2008). Model identification. In B. Siciliano & O. Khatib (Eds.), Springer handbook of robotics (pp. 321–344). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-30301-5_15.
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., et al. (2019). Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Computer vision and pattern recognition (CVPR).
Johannink, T., Bahl, S., Nair, A., Luo, J., Kumar, A., Loskyll, M., et al. (2018). Residual reinforcement learning for robot control. CoRR arXiv:1812.03201.
Khosla, P., & Kanade, T. (1985). Parameter identification of robot dynamics. In 1985 24th IEEE conference on decision and control (pp. 1754–1760). New York: IEEE. https://doi.org/10.1109/CDC.1985.268838.
Kingma, D. P, & Ba, J. (2014). Adam: A method for stochastic optimization. Preprint arXiv:1412.6980.
Kloss, A., Schaal, S., & Bohg, J. (2017). Combining learned and analytical models for predicting action effects. CoRR.
Kolev, S., & Todorov, E. (2015 December). Physically consistent state estimation and system identification for contacts. In IEEE-RAS international conference on humanoid robots (Vol. 2015, pp. 1036–1043). New York: IEEE. https://doi.org/10.1109/HUMANOIDS.2015.7363481.
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1), 503–528. https://doi.org/10.1007/BF01589116.
Article MathSciNet MATH Google Scholar
Lutter, M., Ritter, C., & Peters, J. (2019). Deep Lagrangian networks: Using physics as model prior for deep learning. In 7th international conference on learning representations (ICLR).
Ma, D., & Rodriguez, A. (2018). Friction variability in auto-collected dataset of planar pushing experiments and anisotropic friction. IEEE Robotics and Automation Letters, 3(4), 3232–3239. https://doi.org/10.1109/LRA.2018.2851026.
Article Google Scholar
Mehta, B., Diaz, M., Golemo, F., Pal, C. J., & Paull, L. (2019). Active domain randomization. CoRR arXiv:1904.04762.
Paszke, A., Chanan, G., Lin, Z., Gross, S., Yang, E., Antiga, L., & Devito, Z. (2017). Automatic differentiation in PyTorch. In 31st conference on neural information processing systems (NIPS), Long Beach, CA (pp. 1–4). https://doi.org/10.1017/CBO9781107707221.009.
Peng, X. B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018). Sim-to-real transfer of robotic control with dynamics randomization. In IEEE international conference on robotics and Automation (ICRA) (pp. 1–8).
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W., & Abbeel, P. (2017). Asymmetric actor critic for image-based robot learning. In Robotics: Science and systems XIV, robotics: Science and systems foundation. https://doi.org/10.1007/s10869-008-9083-z.
Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12(1), 145–151.
Article MathSciNet Google Scholar
Rajeswaran, A., Ghotra, S., Ravindran, B., & Levine, S. (2016). EPOpt: Learning robust neural network policies using model ensembles. CoRR. https://doi.org/10.1073/pnas.211563298.
Ramos, F., Possas, R. C., & Fox, D. (2019). BayesSim: Adaptive domain randomization via probabilistic inference for robotics simulators. In Proceedings of robotics: Science and systems, Freiburg im Breisgau, Germany. https://doi.org/10.15607/RSS.2019.XV.029.
Ruder, S. (2016). An overview of gradient descent optimization algorithms. CoRR arXiv:1609.04747.
Rusu, A. A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., & Hadsell, R. (2017). Sim-to-real robot learning from pixels with progressive nets. In Conference on robot learning (pp. 262–270).
Silver, T., Allen, K., Tenenbaum, J., & Kaelbling, L. P. (2018). Residual policy learning. CoRR arXiv:1907.11200.
Tan, J., Xie, Z., Boots, B., & Liu, C. K. (2016). Simulation-based design of dynamic controllers for humanoid balancing. In IEEE international conference on intelligent robots and systems (pp. 2729–2736). New York: IEEE. https://doi.org/10.1109/IROS.2016.7759424.
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V. (2018). Sim-to-real: Learning agile locomotion for quadruped robots. In Robotics: Science and systems XIV, robotics: Science and systems foundation. arXiv:1804.10332v2.
Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26–31.
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE international conference on intelligent robots and systems (pp. 23–30). New York: IEEE https://doi.org/10.1109/IROS.2017.8202133.
Todorov, E. (2011). A convex, smooth and invertible contact model for trajectory optimization. In Proceedings—IEEE international conference on robotics and automation (pp. 1071–1076). New York: IEEE. https://doi.org/10.1109/ICRA.2011.5979814.
Toussaint, M., Allen, K., Smith, K., & Tenenbaum, J. (2018). Differentiable physics and stable modes for tool-use and manipulation planning. In Robotics: Science and systems XIV. https://doi.org/10.15607/RSS.2018.XIV.044.
Xu, Z., Wu, J., Zeng, A., Tenenbaum, J., Song, S. (2019). DensePhysNet: Learning dense physical object representations via multi-step dynamic interactions. In Proceedings of robotics: Science and systems, Freiburg im Breisgau, Germany. https://doi.org/10.15607/RSS.2019.XV.046.
Yu, W., Tan, J., Liu, C. K., & Turk, G. (2017). Preparing for the unknown: Learning a universal policy with online system identification. In Proceedings of robotics: Science and systems, Cambridge, MA. https://doi.org/10.15607/RSS.2017.XIII.048.
Zeng, A., Song, S., Lee, J., Rodriguez, A., & Funkhouser, T. A. (2019). TossingBot: Learning to throw arbitrary objects with residual physics. In Proceedings of robotics: Science and systems, Freiburg im Breisgau, Germany. https://doi.org/10.15607/RSS.2019.XV.004.
Zhang, J., Tai, L., Yun, P., Xiong, Y., Liu, M., Boedecker, J., et al. (2019). Vr-goggles for robots: Real-to-sim domain adaptation for visual control. IEEE Robotics and Automation Letters, 4(2), 1148–1155.
Article Google Scholar
Zhu, S., Kimmel, A., Bekris, K. E., Boularias, A. (2018). Fast model identification via physics engines for data-efficient policy search. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 3249–3256).
Zhu, S., Surovik, D., Bekris, K. E., & Boularias, A. (2019). Closing the reality gap of robotic simulators through task-oriented Bayesian optimization. Journal of Machine Learning Research, 20, 1–24.
Google Scholar

Download references

Acknowledgements

We thank Josiah Hanna and Scott Niekum for their helpful suggestions on this paper. This work was supported by United States National Science Foundation (NSF) Grants IIS-1564080 and IIS-1724157, and United States Office of Naval Research (ONR) Grants N000141612835 and N000141612785.

Author information

Elaine Schaertl Short
Present address: Tufts University, Medford, MA, 02155, USA

Authors and Affiliations

The University of Texas at Austin, Austin, TX, 78712, USA
Adam David Allevato, Elaine Schaertl Short, Mitch Pryor & Andrea L. Thomaz

Authors

Adam David Allevato
View author publications
You can also search for this author in PubMed Google Scholar
Elaine Schaertl Short
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Pryor
View author publications
You can also search for this author in PubMed Google Scholar
Andrea L. Thomaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam David Allevato.

Ethics declarations

Conflict of interest

In addition to the funding listed above, the authors state the following relationships and interests. Adam Allevato holds stock options in Diligent Robotics, Inc. Elaine Schaertl Short is supported by a Clare Boothe Luce Professorship from the Henry Luce Foundation and was funded by Microsoft for travel to the AI Breakthoughs workshop. Mitch Pryor is a consultant for Finnegan, LLP, received travel support from the British Consulate, and is funded under the research Grants DOE-LANL Grant 407626, DOE-IRP Grant DE-EM0004384, Army Research Office Grant W911NF-17-2-0180, Army Research Office (CMU Subcontract) W911NF-18-2-0218, Phillips 66 Project #UTA19-000187, Woodside Project #UTA17-001210, Wilder Systems Project #UTA18-000696, as well as the RAPID UT Industry Affiliate Program. Andrea L. Thomaz is the CEO of, and holds stock in, Diligent Robotics, Inc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 TuneNet architecture implementation details

We implement all networks using PyTorch (Paszke et al. 2017), and train them using stochastic gradient descent. Our TuneNet architecture for all experiments consists of fully-connected (FC) layers with ReLU activation functions for each of the feature extractors, with each feature extractor output size set to 128. The estimator therefore has an input size of 256. We model the estimator with two fully connected layers with a hidden size of 128, ReLU activation for the first layer, and a tanh activation for the final output.

1.2 Details for Experiment 1: End-effector mass identification

The object’s mass is randomly chosen, \(m \sim U({0}\,\hbox {kg}, {1}\,\hbox {kg})\), for each training point. In each simulated episode, which runs for 5 s, the robot moves its end effector in a circle defined by the coordinates \([-\,0.4, 0.2\text {cos}(\tau ), 0.2\text {sin}(\tau )], \tau = 1.2t\). We collect 1000, 300, and 300 episodes for the training, validation, and test sets, respectively. We trained the model for 1000 epochs using a batch size of 50, a learning rate of 0.01, L2 regularization \(\lambda = 1e{-}3\), and 1% learning rate decay every 50 epochs. During training, the input torques were normalized to the range [0, 1] but the outputs were not transformed.

1.3 Details for Experiment 2: Bouncing ball COR tuning

For both datasets, we trained TuneNet for 200 epochs, using a batch size of 50 and a learning rate of 1e\({-}\)2, with learning rate decay of 1% per 5 epochs. The L2 normalization constant \(\lambda \) is set to 0.01. During training, the input and output channels were normalized to the range [0, 1].

For the Obs case, the visual OpenCV tracker uses hue, saturation, and value windows to threshold a camera image and isolate an object. It then calculates the y pixel position of the ball in each frame and normalizes the image pixel positions to the range [0, 1]. For each randomized simulation render, the PyBullet camera was placed at random polar coordinates \(r \sim \text {U}\) (5 m, 10 m), \(\theta \sim \text {U}(0, 2\pi )\), \(z \sim \text {U}\) (5 m, 8 m), and aimed at the point (0, 0, 2 m).

1.3.1 Baseline implementation details

For CMA-ES, we use the open-source package pycma.^{Footnote 3} The inputs to CMA-ES are identical to those provided to TuneNet, with the initial proposed physics parameters being used as the initial guess for the CMA algorithm. The CMA tolerance was set to 0.1.

For Greedy Entropy Search (EntSearch in this paper), we re-implemented the algorithm provided in Zhu et al. (2018). The underlying Gaussian Process (GP) sampling was implemented using the open-source Python package GPy,^{Footnote 4} and we used an RBF GP kernel. Again, the initial proposed physics parameters were used as the initial guess for the algorithm. We discretize the parameter space into 50 bins, and use a gaussian process sampling population size of \(n=100\). Because we do not calculate the value function for each policy, we cannot use the value as the stopping criterion as in Zhu et al. (2018). Instead, we stop iterating when the predicted parameter value does not change for two consecutive timesteps. This stopping parameter was determined empirically for our experiment after being shown to outperform two other stopping criteria: no stopping (which performed poorly because of noise in the GP sampling), and stopping after the parameter estimate changed by less than a threshold \(\epsilon > 0\).

The initial guesses and simulated outcomes supplied to both CMA-ES and Greedy Entropy Search were identical to the inputs supplied to TuneNet, including all normalization transformations applied to the data.

The “Direct Prediction” network uses the same network architecture as TuneNet, but rather than stacking the observations from two models, the single model input is padded with zeros. This network is trained entirely from scratch separately from TuneNet (no weights are shared between the two models).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Allevato, A.D., Schaertl Short, E., Pryor, M. et al. Iterative residual tuning for system identification and sim-to-real robot learning. Auton Robot 44, 1167–1182 (2020). https://doi.org/10.1007/s10514-020-09925-w

Download citation

Received: 20 December 2019
Accepted: 09 June 2020
Published: 27 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10514-020-09925-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Iterative residual tuning for system identification and sim-to-real robot learning

Abstract

Access this article

Similar content being viewed by others

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Closing the Reality Gap with Unsupervised Sim-to-Real Image Translation

Bridging the Reality Gap via Progressive Bayesian Optimisation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 TuneNet architecture implementation details

1.2 Details for Experiment 1: End-effector mass identification

1.3 Details for Experiment 2: Bouncing ball COR tuning

1.3.1 Baseline implementation details

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Iterative residual tuning for system identification and sim-to-real robot learning

Abstract

Access this article

Similar content being viewed by others

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Closing the Reality Gap with Unsupervised Sim-to-Real Image Translation

Bridging the Reality Gap via Progressive Bayesian Optimisation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 TuneNet architecture implementation details

1.2 Details for Experiment 1: End-effector mass identification

1.3 Details for Experiment 2: Bouncing ball COR tuning

1.3.1 Baseline implementation details

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation