Optimal action sequence generation for assistive agents in fixed horizon tasks

Baraka, Kim; Melo, Francisco S.; Couto, Marta; Veloso, Manuela

doi:10.1007/s10458-020-09458-7

Optimal action sequence generation for assistive agents in fixed horizon tasks

Published: 27 April 2020

Volume 34, article number 33, (2020)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Kim Baraka ORCID: orcid.org/0000-0003-4381-4234^1,2,
Francisco S. Melo^2,3,
Marta Couto³ &
…
Manuela Veloso⁴

472 Accesses
5 Citations
4 Altmetric
Explore all metrics

Abstract

Agents providing assistance to humans are faced with the challenge of automatically adjusting the level of assistance to ensure optimal performance. In this work, we argue that identifying the right level of assistance consists in balancing positive assistance outcomes and some (domain-dependent) measure of cost associated with assistive actions. Towards this goal, we contribute a general mathematical framework for structured tasks where an agent playing the role of a ‘provider’—e.g., therapist, teacher—assists a human ‘receiver’—e.g., patient, student. We specifically consider tasks where the provider agent needs to plan a sequence of actions over a fixed time horizon, where actions are organized along a hierarchy with increasing success probabilities, and some associated costs. The goal of the provider is to achieve a success with the lowest expected cost possible. We present OAssistMe, an algorithm that generates cost-optimal action sequences given the action parameters, and investigate several extensions of it, motivated by different potential application domains. We provide an analysis of the algorithms, including proofs for a number of properties of optimal solutions that, we show, align with typical human provider strategies. Finally, we instantiate our theoretical framework in the context of robot-assisted therapy tasks for children with Autism Spectrum Disorder (ASD). In this context, we present methods for determining action parameters based on a survey of domain experts and real child-robot interaction data. Our contributions unlock increased levels of flexibility for agents introduced in a variety of assistive contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards A Robot-Assisted Autism Diagnostic Protocol: Modelling and Assessment with POMDP

High-Level Motor Planning Assessment During Performance of Complex Action Sequences in Humans and a Humanoid Robot

Article 17 August 2020

Adaptive Robot Assisted Therapy Using Interactive Reinforcement Learning

References

Alagoz, O., Hsu, H., Schaefer, A. J., & Roberts, M. S. (2010). Markov decision processes: A tool for sequential decision making under uncertainty. Medical Decision Making, 30(4), 474–483.
Article Google Scholar
Anderson, J. R., Boyle, C. F., & Reiser, B. J. (1985). Intelligent tutoring systems. Science, 228(4698), 456–462.
Article Google Scholar
Association, A. P., et al. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). Arlington County: American Psychiatric Publishing.
Book Google Scholar
Baraka, K., Couto, M., Melo, F. S., Paiva, A., & Veloso, M. (2019). An approach for personalized social interactions between a therapeutic robot and children with autism spectrum disorder. In Technical reports on GAIPS-TR-001-19, intelligent agents and synthetic characters group (GAIPS), Porto Salvo, Portugal. Retrieved September, 2019 from https://gaips.inesc-id.pt/component/gaips/publications/showPublication/3/597.
Baraka, K., Couto, M., Melo, F. S., & Veloso, M. (2019). An optimization approach for structured agent-based provider/receiver tasks. In Proceedings of the 18th international conference on autonomous agents and multiAgent systems (pp. 95–103). International Foundation for Autonomous Agents and Multiagent Systems.
Barnes, T., & Stamper, J. (2008). Toward automatic hint generation for logic proof tutoring using historical student data. In International conference on intelligent tutoring systems (pp. 373–382). Springer.
Brunskill, E., & Russell, S. (2011). Partially observable sequential decision making for problem selection in an intelligent tutoring system. In International conference on educational data mining. Retrieved December, 2019 from http://educationaldatamining.org/EDM2011/wp-content/uploads/proc/edm2011_poster7_Brunskill.pdf.
Chandra, S., Dillenbourg, P., & Paiva, A. (2017). Developing learning scenarios to foster children’s handwriting skills with the help of social robots. In Proceedings of the companion of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 337–338). ACM.
Clement, B., Roy, D., Oudeyer, P. Y., & Lopes, M. (2014). Online optimization of teaching sequences with multi-armed bandits. In 7th International conference on educational data mining.
Conati, C., & Maclaren, H. (2009). Empirically building and evaluating a probabilistic model of user affect. User Modeling and User-Adapted Interaction, 19(3), 267–303.
Article Google Scholar
Conn, K., Liu, C., Sarkar, N., Stone, W., & Warren, Z. (2008). Affect-sensitive assistive intervention technologies for children with autism: An individual-specific approach. In Proceedings of the 17th IEEE international symposium on robot and human interactive communication, RO-MAN (pp. 442–447). https://doi.org/10.1109/ROMAN.2008.4600706.
Esteban, P., Baxter, P., Belpaeme, P., Billing, E., Cai, H., Cao, H., et al. (2017). How to build a supervised autonomous system for robot-enhanced therapy for children with autism spectrum disorder. Paladyn, Journal of Behavioral Robotics, 8, 18–38.
Article Google Scholar
Feil-Seifer, D., & Matarić, M. J. (2011). Socially assistive robotics. IEEE Robotics and Automation Magazine, 18(1), 24–31.
Article Google Scholar
Folsom-Kovarik, J. T., Sukthankar, G., & Schatz, S. (2013). Tractable POMDP representations for intelligent tutoring systems. ACM Transactions on Intelligent Systems and Technology (TIST), 4(2), 1–22.
Article Google Scholar
Frank Lopresti, E., Mihailidis, A., & Kirsch, N. (2004). Assistive technology for cognitive rehabilitation: State of the art. Neuropsychological Rehabilitation, 14(1–2), 5–39.
Article Google Scholar
Gibbons, P. (2002). Scaffolding language, scaffolding learning. Portsmouth, NH: Heinemann.
Google Scholar
Grover, S., Chakraborti, T., & Kambhampati, S. (2018). What can automated planning do for intelligent tutoring systems? In Proceedings of the scheduling and planning applications workshop (SPARK) at the international conference on automated planning and scheduling (ICAPS) (pp. 27–36).
Hauskrecht, M., & Fraser, H. (2000). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18(3), 221–244.
Article Google Scholar
Head, H. (2014). Aphasia and kindred disorders of speech (Vol. 2). Cambridge: Cambridge University Press.
Google Scholar
Hersch, G. I., Lamport, N. K., & Coffey, M. S. (2005). Activity analysis: Application to occupation. Thorofare: SLACK Incorporated.
Google Scholar
Hoey, J., Boutilier, C., Poupart, P., Olivier, P., Monk, A., & Mihailidis, A. (2013). People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(4), 1–36.
Google Scholar
Horvitz, E (1997) Agents with beliefs: Reflections on bayesian methods for user modeling. In User modeling (pp. 441–442). Springer.
Itti, L., & Baldi, P. F. (2006). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547–554).
Kenny, P., Parsons, T., Gratch, J., Rizzo, A. (2008). Virtual humans for assisted health care. In Proceedings of the 1st international conference on pervasive technologies related to assistive environments (p. 6). ACM.
Kim, G., Lim, S., Kim, H., Lee, B., Seo, S., Cho, K., et al. (2017). Is robot-assisted therapy effective in upper extremity recovery in early stage stroke? A systematic literature review. Journal of Physical Therapy Science, 29(6), 1108–1112.
Article Google Scholar
Klein, L., Itti, L., Smith, B. A., Rosales M., Nikolaidis, S., Matarić, M. J.: Surprise! predicting infant visual attention in a socially assistive robot contingent learning paradigm. In 2019 IEEE international symposium on robot and human interactive communication (2019).
Lawson, R. P., Rees, G., & Friston, K. J. (2014). An aberrant precision account of autism. Frontiers in Human Neuroscience, 8, 302.
Article Google Scholar
Leite, I. (2015). Long-term interactions with empathic social robots. AI Matters, 1(3), 13–15.
Article MathSciNet Google Scholar
Linebaugh, C. W., & Lehner, L. H. (1977). Cueing hierarchies and word retrieval: A therapy program. In Clinical aphasiology: Proceedings of the conference 1977 (pp. 19–31). BRK Publishers.
Lord, C., Rutter, M., Dilavore, P., Risi, S., Gotham, K., & Bishop, S. (2012). Autism diagnostic observation schedule (2nd ed.). Torrance, CA: Western Psychological Services.
Google Scholar
Luckin, R., Koedinger, K. R., & Greer, J. (2007). Artificial intelligence in education: Building technology rich learning contexts that work (Vol. 158). Amsterdam: IOS Press.
Google Scholar
Murray, R. C., & VanLehn, K. (2006). A comparison of decision-theoretic, fixed-policy and random tutorial action selection. In International conference on intelligent tutoring systems (pp. 114–123). Springer.
Nikolaidis, S., Zhu, Y. X., Hsu, D., & Srinivasa, S. (2017). Human–robot mutual adaptation in shared autonomy. In Proceedings of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 294–302). ACM.
Palestra, G., Varni, G., Chetouani, M., & Esposito, F. (2016). A multimodal and multilevel system for robotics treatment of autism in children. In Proceedings of the international workshop on social learning and multimodal interaction for designing artificial agents—DAA ’16 (pp. 1–6). https://doi.org/10.1145/3005338.3005341. http://dl.acm.org/citation.cfm?doid=3005338.3005341.
Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433.
Article Google Scholar
Petric, F., Miklic, D., & Kovacic, Z. (2017). Robot-assisted autism spectrum disorder diagnostics using pomdps. In Proceedings of the companion of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 369–370).
Rivers, K., & Koedinger, K. R. (2017). Data-driven hint generation in vast solution spaces: A self-improving python programming tutor. International Journal of Artificial Intelligence in Education, 27(1), 37–64.
Article Google Scholar
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern approach. Malaysia: Pearson Education Limited.
MATH Google Scholar
Scassellati, B., Admoni, H., & Matarić, M. (2012). Robots for use in autism research. Annual Review of Biomedical Engineering, 14, 275–294.
Article Google Scholar
Schaaf, R. C., & Roley, S. S. (2006). Sensory integration: Applying clinical reasoning to practice with diverse populations. Austin: PRO-ED Incorporated.
Google Scholar
Schaefer, A. J., Bailey, M. D., Shechter, S. M., & Roberts, M. S. (2005). Modeling medical treatment using Markov decision processes. In Operations research and health care (pp. 593–612). Springer.
Schwartenbeck, P., & Friston, K. (2016). Computational phenotyping in psychiatry: A worked example. Eneuro, 3(4), 1–18.
Article Google Scholar
Short, E., Swift-Spong, K., Greczek, J., Ramachandran, A., Litoiu, A., Grigore, E. C., Feil-Seifer, D., Shuster, S., Lee, J. J., & Huang, S., et al. (2014). How to train your dragonbot: Socially assistive robots for teaching children about nutrition through play. In The 23rd IEEE international symposium on robot and human interactive communication (pp. 924–929). IEEE.
Van Vuuren, S., & Cherney, L. R. (2014). A virtual therapist for speech and language therapy. In International conference on intelligent virtual agents (pp. 438–448). Springer.
Warren, Z. E., Zheng, Z., Swanson, A. R., Bekele, E. T., Zhang, L., Crittendon, J. A., et al. (2015). Can robotic interaction improve joint attention skills? Journal of Autism and Developmental Disorders, 45(11), 3726–3734. https://doi.org/10.1007/s10803-013-1918-4.
Article Google Scholar
Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547.
Article Google Scholar
You, Z. J., Shen, C. Y., Chang, C. W., Liu, B. J., & Chen, G. D. (2006). A robot as a teaching assistant in an english class. In Sixth international conference on advanced learning technologies, 2006 (pp. 87–91). IEEE.
Zhang, Y., Steimle, L., & Denton, B. (2017). Robust Markov decision processes for medical treatment decisions. Optimization Online. Retrieved December, 2019 from http://www.optimizationonline.org/DB_FILE/2015/10/5134.pdf.

Download references

Acknowledgements

We would like to thank Ana Paiva for her input on the child-robot interaction study. We also thank the reviewers for their valuable suggestions. This research was partially supported by the CMUPERI/HCI/0051/2013 grant, associated with the CMU/Portugal INSIDE project (http://www.project-inside.pt/), as well as national funds through Fundação para a Ciência e a Tecnologia (FCT) with references UID/CEC/50021/2020 and SFRH/BD/128359/2017. The views and conclusions contained in this document are those of the authors only.

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Kim Baraka
Instituto Superior Técnico, Universidade de Lisboa, Porto Salvo, Portugal
Kim Baraka & Francisco S. Melo
INSEC-ID, Porto Salvo, Portugal
Francisco S. Melo & Marta Couto
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Manuela Veloso

Authors

Kim Baraka
View author publications
You can also search for this author in PubMed Google Scholar
Francisco S. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Marta Couto
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Veloso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kim Baraka.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

M. Veloso is currently Head of AI Research at JPMorgan Chase.

Appendix: Proofs

Our proofs are structured along the following three (mutually exclusive) cases:

a.
$O^*_1>0$, or equivalently $R<\min _a{c(a)/p(a)}$
b.
$O^*_1<0$, or equivalently $R>\min _a{c(a)/p(a)}$
c.
$O^*_1=0$, or equivalently $R=\min _a{c(a)/p(a)}$

Lemma 1

For any T, we have one of:

a.
$0<O^*_T<\min _a{c(a)/p(a)}-R$
b.
$0>O^*_T>\min _a{c(a)/p(a)}-R$
c.
$0=O^*_T=\min _a{c(a)/p(a)}-R$

Proof

We use induction on T. From Eq. (7): $O^*_T=\min _a{\{(1-p(a))O^{*}_{T-1}+c(a)-p(a)R\}}, \;\;\; O^*_1=\min _a{\{c(a)-p(a)R\}}$ applies in all cases.

Case (a):

Base case: $0<O^*_1=\min _a{c(a)-p(a)R}<\min _a{c(a)/p(a)}-R$

Induction step: Assume $0<O^*_{T-1}<\min _a{c(a)/p(a)}-R$, then $O^*_T$ is also positive from Eq. (7) and base case. Also, for all a:

$$\begin{aligned} \begin{aligned} O^*_T&\le (1-p(a))O^*_{T-1}+c(a)-p(a)R\\&<(1-p(a))(c(a)/p(a)-R)+c(a)-p(a)R =c(a)/p(a)-R \end{aligned} \end{aligned}$$

By induction, $0<O^*_T<\min _a{c(a)/p(a)}-R$ for all T.

Case (b):

Base case: $0>O^*_1=\min _a{c(a)-p(a)R}>\min _a{c(a)/p(a)}-R$

Induction step: Assume $0>O^*_{T-1}>\min _a{c(a)/p(a)}-R$. Also let $a^\dagger =\arg \min _a{c(a)/p(a)}$, let $a^*$ be the optimal action selected at stage T, and let $a^{*(1)}$ be the optimal action selected at stage 1.

$$\begin{aligned} O^*_T\le (1-p(a^{*(1)}))O^*_{T-1}+c(a^{*(1)})-p(a^{*(1)})R<0 \end{aligned}$$

since $(1-p(a))O^*_{T-1}<0$ for any a, and $c(a^{*(1)})-p(a^{*(1)})R<0$ (base case).

Also, for all a:

$$\begin{aligned} \begin{aligned} O^*_T&=(1-p(a^*))O^*_{T-1}+c(a^*)-p(a^*)R \\&>(1-p(a^*))(c(a^\dagger )/p(a^\dagger )-R)+c(a^*)-p(a^*)R \\&=(1-p(a^*))c(a^\dagger )/p(a^\dagger )+c(a^*)-R \end{aligned} \end{aligned}$$

Using $p(a^*)<p(a^\dagger )c(a^*)/c(a^\dagger )$:

$O^*_T>\left[ 1-p(a^\dagger )c(a^*)/c(a^\dagger )\right] c(a^\dagger )/p(a^\dagger )+c(a^*)-R =c(a^\dagger )/p(a^\dagger )-R$

By induction, $0>O^*_T>\min _a{c(a)/p(a)}-R$ for all T.

Case (c) is easily proven by induction on T. $\square$

Lemma 2

$O^*_T$ is monotonic in T. In particular, it is one of:

a.
Strictly increasing, i.e., $O^*_{T+1}>O^*_{T}$ for all T
b.
Strictly decreasing, i.e., $O^*_{T+1}<O^*_{T}$ for all T
c.
Constant, i.e., $O^*_{T+1}=O^*_{T}$ for all T

Proof

Let $a^*$ be the optimal action of stage T.

Case (a):

$O^*_T/O^*_{T-1}=1-p(a^*)+(c(a^*)-p(a^*)R)/O^*_{T-1}$

From Lemma 1, for any a: $0<O^*_{T-1}<c(a)/p(a)-R$, so $(c(a^*)-p(a^*)R)/O^*_{T-1}>p(a^*)$, hence: $O^*_T/O^*_{T-1}>1$, and $O^*_T>0$ for all T, which establishes that $O^*_T$ is strictly increasing.

Case (b):

The demonstration that $O^*_T/O^*_{T-1}>1$ is identical to case (a). Given that $O^*_T<0$ for all T, then $O^*_T$ is strictly decreasing.

Case (c) follows form the previous lemma.$\square$

Theorem 1

$O^*_T$ converges to $\min _a{c(a)/p(a)}-R$ as T goes to infinity.

Proof

Lemmas 1 and 2 imply convergence of $O^*_T$ in cases (a) and (b). Furthermore, setting $O_{T-1}$ to $O_T$ in Eq. (7) results in a single fixed point $\min _a{c(a)/p(a)}-R$, which establishes the result.

Case (c) is trivial since $\min _a{c(a)/p(a)}-R=0$.$\square$

Theorem 2

If $\varvec{\varPi }^*$ is an optimal sequence, then it is monotonic in t. In particular, $\varvec{\varPi }^*$ is one of:

a.
Nonincreasing, i.e., $a^*_1 \ge a^*_2 \ge \cdots \ge a^*_T$
b.
Nondecreasing, i.e., $a^*_1 \le a^*_2 \le \cdots \le a^*_T$
c.
Constant, i.e., $a^*_1= a^*_2=\cdots = a^*_T$

Proof

Let $a'$ be an optimal action associated with $O^*_{T-1}$ and $a''$ an optimal action associated with $O^*_T$. Then:

$$\begin{aligned}(1-p(a''))O^*_T+c(a'')-p(a'')R \le (1-p(a'))O^*_T+c(a')-p(a')R \\ (p(a')-p(a''))O^*_T\le c(a')-c(a'')-R(p(a')- p(a'')) \end{aligned}$$

(15)

and

$$\begin{aligned} (1-p(a'))O^*_{T-1}+c(a')-p(a')R \le (1-p(a''))O^*_{T-1}+c(a'')-p(a'')R \\ (p(a')-p(a''))O^*_{T-1}\ge c(a')-c(a'')-R(p(a')- p(a'')) \end{aligned}$$

(16)

Combining Eqs. (15) and (16), we get:

$$\begin{aligned} (p(a')-p(a''))O^*_{T}\le (p(a')-p(a''))O^*_{T-1} \end{aligned}$$

We can conclude that:

If $p(a')>p(a'')$: $O^*_T\le O^*_{T-1}$ and if $p(a')<p(a'')$: $O^*_T\ge O^*_{T-1}$

Assume $a'>a''$. Then $p(a')>p(a'')$. From the previous result, in case (a): $O^*_T\le O^*_{T-1}$, which contradicts Lemma 2. Hence, $a'\le a''$, which establishes that $\varvec{\varPi }^*$ is nonincreasing.

Similarly, we can show that, in case (b), $\varvec{\varPi }^*$ is nondecreasing.

In case (c), every step is equivalent to the single trial case, and the same action is selected at every trial, so the resulting sequence is constant.$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baraka, K., Melo, F.S., Couto, M. et al. Optimal action sequence generation for assistive agents in fixed horizon tasks. Auton Agent Multi-Agent Syst 34, 33 (2020). https://doi.org/10.1007/s10458-020-09458-7

Download citation

Published: 27 April 2020
DOI: https://doi.org/10.1007/s10458-020-09458-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal action sequence generation for assistive agents in fixed horizon tasks

Abstract

Access this article

Similar content being viewed by others

Towards A Robot-Assisted Autism Diagnostic Protocol: Modelling and Assessment with POMDP

High-Level Motor Planning Assessment During Performance of Complex Action Sequences in Humans and a Humanoid Robot

Adaptive Robot Assisted Therapy Using Interactive Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proofs

Lemma 1

Proof

Lemma 2

Proof

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal action sequence generation for assistive agents in fixed horizon tasks

Abstract

Access this article

Similar content being viewed by others

Towards A Robot-Assisted Autism Diagnostic Protocol: Modelling and Assessment with POMDP

High-Level Motor Planning Assessment During Performance of Complex Action Sequences in Humans and a Humanoid Robot

Adaptive Robot Assisted Therapy Using Interactive Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Lemma 1

Proof

Lemma 2

Proof

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation