Discovering relevant task spaces using inverse feedback control

Jetchev, Nikolay; Toussaint, Marc

doi:10.1007/s10514-014-9384-1

Discovering relevant task spaces using inverse feedback control

Published: 09 February 2014

Volume 37, pages 169–189, (2014)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Nikolay Jetchev¹ &
Marc Toussaint²

591 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

Learning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. However, the usual approaches do not answer the question of what are appropriate representations to generate motion for a specific task. Since it is time-consuming for a human expert to manually design the motion control representation for a task, we propose to uncover such structure from data-observed motion trajectories. Inspired by Inverse Optimal Control, we present a novel method to learn a latent value function, imitate and generalize demonstrated behavior, and discover a task relevant motion representation. We test our method, called Task Space Retrieval Using Inverse Feedback Control (TRIC), on several challenging high-dimensional tasks. TRIC learns the important control dimensions for the tasks from a few example movements and is able to robustly generalize to new situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Industrial Robotics

Notes

To simplify even more we use $t_0 =1$ and $t_T = T$.
Note that the demonstrations may be from different situations, with objects placed on different locations. Since the features $y_i^t$ may be object relative they cannot be captured from $q_t^i$ alone—which we neglected in our notation. We therefore “record” also $y_t^i$ and the Jacobians $\dfrac{\partial \phi }{\partial q}(q_t^i)$ for all demonstrations. If it is clear from the context that we are dealing with just one trajectory, we will skip the superscript $i$ and write just $q_t$ instead of $q_t^i$.
Here we write $\dfrac{\partial f}{\partial y}$ instead of $\dfrac{\partial f}{\partial \phi }$, because $y = \phi (q)$.
It is still decreasing despite the gradient sign change because of other features coupled geometrically to $p_{3,1}^y$.
Implicit surface object models are learned from sensory data and the object surface is a nonlinear function potential itself, e.g. a Gaussian Process or SVR, see Steinke et al. (2005).

References

Argall, B. D., Chernova, S., Veloso, M. M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Article Google Scholar
Bain, M., and Sammut, C. (1996). A framework for behavioural cloning. In Machine intelligence, vol 15 (pp. 103–129). Oxford: Oxford University Press.
Berniker, M., & Kording, K. (2008). Estimating the sources of motor errors for adaptation and generalization. Nature Neuroscience, 11(12), 1454–1461.
Article Google Scholar
Billard, A., Epars, Y., Calinon, S., Cheng, G., & Schaal, S. (2004). Discovering optimal imitation strategies. Robotics and Autonomous Systems, Special Issue: Robot Learning from Demonstration, 47(2–3), 69–77.
Article Google Scholar
Calinon, S., and Billard, A. (2007). Incremental learning of gestures by imitation in a humanoid robot. In HRI ’07: Proceedings of the ACM/IEEE International Conference on Human–Robot Interaction (pp. 255–262).
Call, J., and Carpenter, M. (2002). Three sources of information in social learning. Imitation in animals and artifacts, pp. 211–228.
Craig, J. J. (1989). Introduction to robotics: Mechanics and control (2nd ed.). Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc. ISBN 0201095289.
Dragiev, S., Toussaint, M., and Gienger, M. (2011). Gaussian process implict surface for object estimation and grasping. In IEEE International Conference on Robotics and Automation (ICRA).
Gienger, M., Toussaint, M., Jetchev, N., Bendig, A., and Goerick, C. (2008). Optimization of fluent approach and grasp motions. In 8th IEEE-RAS International Conference on Humanoid Robots.
Haindl, M., Somol, P., Ververidis, D., and Kotropoulos, C. (2006). Feature selection based on mutual correlation. In CIARP, pp. 569–577.
Hiraki, K., Sashima, A., & Phillips, S. (1998). From egocentric to allocentric spatial behavior: A computational model of spatial development. Adaptive Behavior, 6(3–4), 371–391.
Article Google Scholar
Ho, E. S. L., Komura, T., & Tai, C.-L. (2010). Spatial relationship preserving character motion adaptation. ACM Transactions on Graphics, 29(4), 1–8.
Article Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27, 105–121.
Article Google Scholar
Jenkins, O. C., and Matarić, M. J. (2004). A spatio-temporal extension to isomap nonlinear dimension reduction. In 21st International Conference on Machine Learning (ICML).
Jetchev, N. (2012). Learning representations from motion trajectories: Analysis and applications to robot planning and control. PhD thesis, FU Berlin. Retrieved from http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000037417. Accessed 1 Aug 2012.
Jetchev N., and Toussaint, M. (2011). Task space retrieval using inverse feedback control. In 28th International Conference on Machine Learning (ICML), pp. 449–456.
Khansari-Zadeh, S. M., and Billard, A. (2010). Bm: An iterative algorithm to learn stable non-linear dynamical systems with gaussian mixture models. In IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388.
Kroemer, O., Detry, R., Piater, J. H., and Peters, J. (2009). Active learning using mean shift optimization for robot grasping. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2610–2615.
Kroemer, O., Detry, R., Piater, J. H., & Peters, J. (2010). Grasping with vision descriptors and motor primitives. ICINCO, 2, 47–54.
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energy-based learning. In Predicting structured data.
Montavon, G., Braun, M., & Müller, K.-R. (2011). Kernel analysis of deep networks. Journal of Machine Learning Research, 12, 2563–2581.
MATH Google Scholar
Muehlig, M., Gienger, M., Steil, J. J., and Goerick, C. (2009). Automatic selection of task spaces for imitation learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4996–5002.
Myers, C. S., & Rabiner, L. R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7), 1389–1409.
Article Google Scholar
Nouri, A., & Littman, M. L. (2010). Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning, 81(1), 85–98.
Article MathSciNet Google Scholar
Perkins, T. J., & Barto, A. G. (2002). Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3, 803–832.
MathSciNet Google Scholar
Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3, 88–97.
Article Google Scholar
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Book MATH Google Scholar
Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., and Srinivasa, S. (2009). Inverse optimal heuristic control for imitation learning. In Proceedings of AISTATS, pp. 424–431.
Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A. (2006). Maximum margin planning. In 26th International Conference on Machine Learning (ICML), pp. 729–736.
Schaal, S., Peters, J., Nakanishi, J., and Ijspeert, A. J. (2003). Learning movement primitives. In International Symposium on Robotics Research, pp. 561–572.
Siciliano, B., & Khatib, O. (Eds.). (2008). Springer handbook of robotics. Berlin: Springer.
MATH Google Scholar
Slotine, J.-J., & Li, W. (1991). Applied nonlinear control. Upper Saddle River, NJ: Prentice Hall.
MATH Google Scholar
Steinke, F., Schölkopf, B., & Blanz, V. (2005). Support vector machines for 3D shape processing. Computer Graphics Forum, 24(3), 285–294.
Article Google Scholar
Tegin, J., Ekvall, S., Kragic, D., Wikander, J., & Iliev, B. (2009). Demonstration-based learning and control for automatic grasping. Intelligent Service Robotics, 2, 23–30.
Article Google Scholar
Toussaint, M. (2009). Robot trajectory optimization using approximate inference. In 26th International Conference on Machine Learning (ICML), pp. 1049–1056.
Toussaint, M. (2011). Robotics. University Lecture, 2011. Retrieved from http://userpage.fu-berlin.de/mtoussai/teaching/11-Robotics/. Accessed 1 Aug 2012.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
MATH MathSciNet Google Scholar
Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.
Article Google Scholar
Wagner, T., Visser, U., & Herzog, O. (2004). Egocentric qualitative spatial knowledge representation for physical robots. Robotics and Autonomous Systems, 49(1–2), 25–42.
Article Google Scholar

Download references

Acknowledgments

This work was supported by the German Research Foundation (DFG), Emmy Noether fellowship TO 409/1-3, and the EU FP7 project TOMSY.

Author information

Authors and Affiliations

FU Berlin, Berlin, Germany
Nikolay Jetchev
University of Stuttgart, Stuttgart, Germany
Marc Toussaint

Authors

Nikolay Jetchev
View author publications
You can also search for this author in PubMed Google Scholar
Marc Toussaint
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolay Jetchev.

Appendix

1.1 Proof of proposition 1 for the direction of IK generated motion steps

Proposition 1 If $\varrho \rightarrow \infty $ then the IK solution $q_{t+1}$ minimizing Eq. (9) has the property that the next step $q_{t+1} - q_{t}$ is approximately proportional to the value function gradient $\mathcal {J}$ in a small region around $q_t$.

Proof

If $\varrho \rightarrow \infty $ then the term $||f \circ \phi (q) - f \circ \phi (q_t) + \delta ||^2$ of Eq. (9) is weighted so high that $C_{prior}$ and any other cost terms we might add are neglected. Let $\mathcal {J}$ be the gradient of the value function $f \circ \phi (q)$ evaluated at $q = q_t$. Using the linearization $f \circ \phi (q_{t+1}) = f \circ \phi (q_t) + \mathcal {J}(q_{t+1} - q_t)$, we can apply the IK Equation (Toussaint 2011):

$$\begin{aligned} q_{t+1}&= q_t - \delta \mathcal {J^{\sharp }}\\ \mathcal {J^{\sharp }}&= {\left( \varrho \mathcal {J}^T\mathcal {J} + \mathbb {I} \right) }^{-1}\mathcal {J}^T\varrho \\&= \mathcal {J}^T {\left( \mathcal {J}\mathcal {J}^T + {\varrho }^{-1}\right) }^{-1} = \frac{1}{||\mathcal {J}||^2}\mathcal {J}^T \end{aligned}$$

We have used the Woodbury identity and the fact that $\mathcal {J}\mathcal {J}^T = ||\mathcal {J}||^2$ in the case where we have a 1-dimensional task variable $y$ (in that case the Jacobian is a row vector gradient). $\mathcal {J^{\sharp }}$ is called the pseudoinverse of $\mathcal {J}$. Thus, the steps generated by our motion model are proportional to $\mathcal {J}$ times a negative scalar number.$\square $

1.2 Proof of proposition 2 for Lyapunov attractor properties of TRIC

Proposition 2 Suppose we have trained TRIC on a single trajectory $\{q_t, y_t\}_{t=1}^T$ and that $f \circ \phi (q_T)$ is a minimum of the value function. Additionally, we generate motion with $\varrho \rightarrow \infty $, i.e. very high weighting of the value function. Then the motion generated by the model in Eq. (8) fulfills the conditions of Theorem 1 and is thus asymptotically stable at the attractor subspace $Q' = \{q': \phi (q') = \phi (q_T) = y_T \}$.

Proof

Because of $\varrho \rightarrow \infty $ the term decreasing the value function $f$ will dominate the motion equation and we can ignore the effect of the other terms. Let’s construct $V(q) = f\circ \phi (q) - c_T$, where $c_T = f\circ \phi (q_T)$. Then the Lyapunov stability conditions hold:

(a) holds directly because of the assumption that $c_T= f\circ \phi (q_T)$ is minimum and any other joint state $q$ s.t. $\phi (q) \ne \phi (q_T)$ will have higher value $f\circ \phi (q)$ than it.
(b) holds by the construction of $V(q)$ directly implying that
$$\begin{aligned} V(q_T) = f\circ \phi (q_T) - c_T = 0 \end{aligned}$$
Proposition 1 holds because we assumed that $\varrho \rightarrow \infty $. This implies that the steps of the motion model are proportional to the gradient $\mathcal {J}$. Thus the motion model of Eq. (9) will make steps decreasing the value $f\circ \phi (q_t)$ constantly and (c) holds.
(d) holds because $c_T= f\circ \phi (q_T)$ is local minimum and after we reach a joint state $q'$ s.t. $\phi (q') = \phi (q_T)$ the gradient of the value function will be 0 and no further decrease will be possible.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jetchev, N., Toussaint, M. Discovering relevant task spaces using inverse feedback control. Auton Robot 37, 169–189 (2014). https://doi.org/10.1007/s10514-014-9384-1

Download citation

Received: 06 January 2013
Accepted: 16 January 2014
Published: 09 February 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10514-014-9384-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering relevant task spaces using inverse feedback control

Abstract

Access this article

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Industrial Robotics

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of proposition 1 for the direction of IK generated motion steps

Proof

1.2 Proof of proposition 2 for Lyapunov attractor properties of TRIC

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovering relevant task spaces using inverse feedback control

Abstract

Access this article

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Industrial Robotics

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of proposition 1 for the direction of IK generated motion steps

Proof

1.2 Proof of proposition 2 for Lyapunov attractor properties of TRIC

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation