Skip to main content

Advertisement

Log in

Optimal action sequence generation for assistive agents in fixed horizon tasks

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Agents providing assistance to humans are faced with the challenge of automatically adjusting the level of assistance to ensure optimal performance. In this work, we argue that identifying the right level of assistance consists in balancing positive assistance outcomes and some (domain-dependent) measure of cost associated with assistive actions. Towards this goal, we contribute a general mathematical framework for structured tasks where an agent playing the role of a ‘provider’—e.g., therapist, teacher—assists a human ‘receiver’—e.g., patient, student. We specifically consider tasks where the provider agent needs to plan a sequence of actions over a fixed time horizon, where actions are organized along a hierarchy with increasing success probabilities, and some associated costs. The goal of the provider is to achieve a success with the lowest expected cost possible. We present OAssistMe, an algorithm that generates cost-optimal action sequences given the action parameters, and investigate several extensions of it, motivated by different potential application domains. We provide an analysis of the algorithms, including proofs for a number of properties of optimal solutions that, we show, align with typical human provider strategies. Finally, we instantiate our theoretical framework in the context of robot-assisted therapy tasks for children with Autism Spectrum Disorder (ASD). In this context, we present methods for determining action parameters based on a survey of domain experts and real child-robot interaction data. Our contributions unlock increased levels of flexibility for agents introduced in a variety of assistive contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Alagoz, O., Hsu, H., Schaefer, A. J., & Roberts, M. S. (2010). Markov decision processes: A tool for sequential decision making under uncertainty. Medical Decision Making, 30(4), 474–483.

    Article  Google Scholar 

  2. Anderson, J. R., Boyle, C. F., & Reiser, B. J. (1985). Intelligent tutoring systems. Science, 228(4698), 456–462.

    Article  Google Scholar 

  3. Association, A. P., et al. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). Arlington County: American Psychiatric Publishing.

    Book  Google Scholar 

  4. Baraka, K., Couto, M., Melo, F. S., Paiva, A., & Veloso, M. (2019). An approach for personalized social interactions between a therapeutic robot and children with autism spectrum disorder. In Technical reports on GAIPS-TR-001-19, intelligent agents and synthetic characters group (GAIPS), Porto Salvo, Portugal. Retrieved September, 2019 from https://gaips.inesc-id.pt/component/gaips/publications/showPublication/3/597.

  5. Baraka, K., Couto, M., Melo, F. S., & Veloso, M. (2019). An optimization approach for structured agent-based provider/receiver tasks. In Proceedings of the 18th international conference on autonomous agents and multiAgent systems (pp. 95–103). International Foundation for Autonomous Agents and Multiagent Systems.

  6. Barnes, T., & Stamper, J. (2008). Toward automatic hint generation for logic proof tutoring using historical student data. In International conference on intelligent tutoring systems (pp. 373–382). Springer.

  7. Brunskill, E., & Russell, S. (2011). Partially observable sequential decision making for problem selection in an intelligent tutoring system. In International conference on educational data mining. Retrieved December, 2019 from http://educationaldatamining.org/EDM2011/wp-content/uploads/proc/edm2011_poster7_Brunskill.pdf.

  8. Chandra, S., Dillenbourg, P., & Paiva, A. (2017). Developing learning scenarios to foster children’s handwriting skills with the help of social robots. In Proceedings of the companion of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 337–338). ACM.

  9. Clement, B., Roy, D., Oudeyer, P. Y., & Lopes, M. (2014). Online optimization of teaching sequences with multi-armed bandits. In 7th International conference on educational data mining.

  10. Conati, C., & Maclaren, H. (2009). Empirically building and evaluating a probabilistic model of user affect. User Modeling and User-Adapted Interaction, 19(3), 267–303.

    Article  Google Scholar 

  11. Conn, K., Liu, C., Sarkar, N., Stone, W., & Warren, Z. (2008). Affect-sensitive assistive intervention technologies for children with autism: An individual-specific approach. In Proceedings of the 17th IEEE international symposium on robot and human interactive communication, RO-MAN (pp. 442–447). https://doi.org/10.1109/ROMAN.2008.4600706.

  12. Esteban, P., Baxter, P., Belpaeme, P., Billing, E., Cai, H., Cao, H., et al. (2017). How to build a supervised autonomous system for robot-enhanced therapy for children with autism spectrum disorder. Paladyn, Journal of Behavioral Robotics, 8, 18–38.

    Article  Google Scholar 

  13. Feil-Seifer, D., & Matarić, M. J. (2011). Socially assistive robotics. IEEE Robotics and Automation Magazine, 18(1), 24–31.

    Article  Google Scholar 

  14. Folsom-Kovarik, J. T., Sukthankar, G., & Schatz, S. (2013). Tractable POMDP representations for intelligent tutoring systems. ACM Transactions on Intelligent Systems and Technology (TIST), 4(2), 1–22.

    Article  Google Scholar 

  15. Frank Lopresti, E., Mihailidis, A., & Kirsch, N. (2004). Assistive technology for cognitive rehabilitation: State of the art. Neuropsychological Rehabilitation, 14(1–2), 5–39.

    Article  Google Scholar 

  16. Gibbons, P. (2002). Scaffolding language, scaffolding learning. Portsmouth, NH: Heinemann.

    Google Scholar 

  17. Grover, S., Chakraborti, T., & Kambhampati, S. (2018). What can automated planning do for intelligent tutoring systems? In Proceedings of the scheduling and planning applications workshop (SPARK) at the international conference on automated planning and scheduling (ICAPS) (pp. 27–36).

  18. Hauskrecht, M., & Fraser, H. (2000). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18(3), 221–244.

    Article  Google Scholar 

  19. Head, H. (2014). Aphasia and kindred disorders of speech (Vol. 2). Cambridge: Cambridge University Press.

    Google Scholar 

  20. Hersch, G. I., Lamport, N. K., & Coffey, M. S. (2005). Activity analysis: Application to occupation. Thorofare: SLACK Incorporated.

    Google Scholar 

  21. Hoey, J., Boutilier, C., Poupart, P., Olivier, P., Monk, A., & Mihailidis, A. (2013). People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(4), 1–36.

    Google Scholar 

  22. Horvitz, E (1997) Agents with beliefs: Reflections on bayesian methods for user modeling. In User modeling (pp. 441–442). Springer.

  23. Itti, L., & Baldi, P. F. (2006). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547–554).

  24. Kenny, P., Parsons, T., Gratch, J., Rizzo, A. (2008). Virtual humans for assisted health care. In Proceedings of the 1st international conference on pervasive technologies related to assistive environments (p. 6). ACM.

  25. Kim, G., Lim, S., Kim, H., Lee, B., Seo, S., Cho, K., et al. (2017). Is robot-assisted therapy effective in upper extremity recovery in early stage stroke? A systematic literature review. Journal of Physical Therapy Science, 29(6), 1108–1112.

    Article  Google Scholar 

  26. Klein, L., Itti, L., Smith, B. A., Rosales M., Nikolaidis, S., Matarić, M. J.: Surprise! predicting infant visual attention in a socially assistive robot contingent learning paradigm. In 2019 IEEE international symposium on robot and human interactive communication (2019).

  27. Lawson, R. P., Rees, G., & Friston, K. J. (2014). An aberrant precision account of autism. Frontiers in Human Neuroscience, 8, 302.

    Article  Google Scholar 

  28. Leite, I. (2015). Long-term interactions with empathic social robots. AI Matters, 1(3), 13–15.

    Article  MathSciNet  Google Scholar 

  29. Linebaugh, C. W., & Lehner, L. H. (1977). Cueing hierarchies and word retrieval: A therapy program. In Clinical aphasiology: Proceedings of the conference 1977 (pp. 19–31). BRK Publishers.

  30. Lord, C., Rutter, M., Dilavore, P., Risi, S., Gotham, K., & Bishop, S. (2012). Autism diagnostic observation schedule (2nd ed.). Torrance, CA: Western Psychological Services.

    Google Scholar 

  31. Luckin, R., Koedinger, K. R., & Greer, J. (2007). Artificial intelligence in education: Building technology rich learning contexts that work (Vol. 158). Amsterdam: IOS Press.

    Google Scholar 

  32. Murray, R. C., & VanLehn, K. (2006). A comparison of decision-theoretic, fixed-policy and random tutorial action selection. In International conference on intelligent tutoring systems (pp. 114–123). Springer.

  33. Nikolaidis, S., Zhu, Y. X., Hsu, D., & Srinivasa, S. (2017). Human–robot mutual adaptation in shared autonomy. In Proceedings of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 294–302). ACM.

  34. Palestra, G., Varni, G., Chetouani, M., & Esposito, F. (2016). A multimodal and multilevel system for robotics treatment of autism in children. In Proceedings of the international workshop on social learning and multimodal interaction for designing artificial agents—DAA ’16 (pp. 1–6). https://doi.org/10.1145/3005338.3005341. http://dl.acm.org/citation.cfm?doid=3005338.3005341.

  35. Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433.

    Article  Google Scholar 

  36. Petric, F., Miklic, D., & Kovacic, Z. (2017). Robot-assisted autism spectrum disorder diagnostics using pomdps. In Proceedings of the companion of the 2017 ACM/IEEE international conference on human–robot interaction (pp. 369–370).

  37. Rivers, K., & Koedinger, K. R. (2017). Data-driven hint generation in vast solution spaces: A self-improving python programming tutor. International Journal of Artificial Intelligence in Education, 27(1), 37–64.

    Article  Google Scholar 

  38. Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern approach. Malaysia: Pearson Education Limited.

    MATH  Google Scholar 

  39. Scassellati, B., Admoni, H., & Matarić, M. (2012). Robots for use in autism research. Annual Review of Biomedical Engineering, 14, 275–294.

    Article  Google Scholar 

  40. Schaaf, R. C., & Roley, S. S. (2006). Sensory integration: Applying clinical reasoning to practice with diverse populations. Austin: PRO-ED Incorporated.

    Google Scholar 

  41. Schaefer, A. J., Bailey, M. D., Shechter, S. M., & Roberts, M. S. (2005). Modeling medical treatment using Markov decision processes. In Operations research and health care (pp. 593–612). Springer.

  42. Schwartenbeck, P., & Friston, K. (2016). Computational phenotyping in psychiatry: A worked example. Eneuro, 3(4), 1–18.

    Article  Google Scholar 

  43. Short, E., Swift-Spong, K., Greczek, J., Ramachandran, A., Litoiu, A., Grigore, E. C., Feil-Seifer, D., Shuster, S., Lee, J. J., & Huang, S., et al. (2014). How to train your dragonbot: Socially assistive robots for teaching children about nutrition through play. In The 23rd IEEE international symposium on robot and human interactive communication (pp. 924–929). IEEE.

  44. Van Vuuren, S., & Cherney, L. R. (2014). A virtual therapist for speech and language therapy. In International conference on intelligent virtual agents (pp. 438–448). Springer.

  45. Warren, Z. E., Zheng, Z., Swanson, A. R., Bekele, E. T., Zhang, L., Crittendon, J. A., et al. (2015). Can robotic interaction improve joint attention skills? Journal of Autism and Developmental Disorders, 45(11), 3726–3734. https://doi.org/10.1007/s10803-013-1918-4.

    Article  Google Scholar 

  46. Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547.

    Article  Google Scholar 

  47. You, Z. J., Shen, C. Y., Chang, C. W., Liu, B. J., & Chen, G. D. (2006). A robot as a teaching assistant in an english class. In Sixth international conference on advanced learning technologies, 2006 (pp. 87–91). IEEE.

  48. Zhang, Y., Steimle, L., & Denton, B. (2017). Robust Markov decision processes for medical treatment decisions. Optimization Online. Retrieved December, 2019 from http://www.optimizationonline.org/DB_FILE/2015/10/5134.pdf.

Download references

Acknowledgements

We would like to thank Ana Paiva for her input on the child-robot interaction study. We also thank the reviewers for their valuable suggestions. This research was partially supported by the CMUPERI/HCI/0051/2013 grant, associated with the CMU/Portugal INSIDE project (http://www.project-inside.pt/), as well as national funds through Fundação para a Ciência e a Tecnologia (FCT) with references UID/CEC/50021/2020 and SFRH/BD/128359/2017. The views and conclusions contained in this document are those of the authors only.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kim Baraka.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

M. Veloso is currently Head of AI Research at JPMorgan Chase.

Appendix: Proofs

Appendix: Proofs

Our proofs are structured along the following three (mutually exclusive) cases:

  1. a.

    \(O^*_1>0\), or equivalently \(R<\min _a{c(a)/p(a)}\)

  2. b.

    \(O^*_1<0\), or equivalently \(R>\min _a{c(a)/p(a)}\)

  3. c.

    \(O^*_1=0\), or equivalently \(R=\min _a{c(a)/p(a)}\)

Lemma 1

For any T, we have one of:

  1. a.

    \(0<O^*_T<\min _a{c(a)/p(a)}-R\)

  2. b.

    \(0>O^*_T>\min _a{c(a)/p(a)}-R\)

  3. c.

    \(0=O^*_T=\min _a{c(a)/p(a)}-R\)

Proof

We use induction on T. From Eq. (7): \(O^*_T=\min _a{\{(1-p(a))O^{*}_{T-1}+c(a)-p(a)R\}}, \;\;\; O^*_1=\min _a{\{c(a)-p(a)R\}}\) applies in all cases.

Case (a):

Base case: \(0<O^*_1=\min _a{c(a)-p(a)R}<\min _a{c(a)/p(a)}-R\)

Induction step: Assume \(0<O^*_{T-1}<\min _a{c(a)/p(a)}-R\), then \(O^*_T\) is also positive from Eq. (7) and base case. Also, for all a:

$$\begin{aligned} \begin{aligned} O^*_T&\le (1-p(a))O^*_{T-1}+c(a)-p(a)R\\&<(1-p(a))(c(a)/p(a)-R)+c(a)-p(a)R =c(a)/p(a)-R \end{aligned} \end{aligned}$$

By induction, \(0<O^*_T<\min _a{c(a)/p(a)}-R\) for all T.

Case (b):

Base case: \(0>O^*_1=\min _a{c(a)-p(a)R}>\min _a{c(a)/p(a)}-R\)

Induction step: Assume \(0>O^*_{T-1}>\min _a{c(a)/p(a)}-R\). Also let \(a^\dagger =\arg \min _a{c(a)/p(a)}\), let \(a^*\) be the optimal action selected at stage T, and let \(a^{*(1)}\) be the optimal action selected at stage 1.

$$\begin{aligned} O^*_T\le (1-p(a^{*(1)}))O^*_{T-1}+c(a^{*(1)})-p(a^{*(1)})R<0 \end{aligned}$$

since \((1-p(a))O^*_{T-1}<0\) for any a, and \(c(a^{*(1)})-p(a^{*(1)})R<0\) (base case).

Also, for all a:

$$\begin{aligned} \begin{aligned} O^*_T&=(1-p(a^*))O^*_{T-1}+c(a^*)-p(a^*)R \\&>(1-p(a^*))(c(a^\dagger )/p(a^\dagger )-R)+c(a^*)-p(a^*)R \\&=(1-p(a^*))c(a^\dagger )/p(a^\dagger )+c(a^*)-R \end{aligned} \end{aligned}$$

Using \(p(a^*)<p(a^\dagger )c(a^*)/c(a^\dagger )\):

\(O^*_T>\left[ 1-p(a^\dagger )c(a^*)/c(a^\dagger )\right] c(a^\dagger )/p(a^\dagger )+c(a^*)-R =c(a^\dagger )/p(a^\dagger )-R\)

By induction, \(0>O^*_T>\min _a{c(a)/p(a)}-R\) for all T.

Case (c) is easily proven by induction on T. \(\square\)

Lemma 2

\(O^*_T\) is monotonic in T. In particular, it is one of:

  1. a.

    Strictly increasing, i.e., \(O^*_{T+1}>O^*_{T}\) for all T

  2. b.

    Strictly decreasing, i.e., \(O^*_{T+1}<O^*_{T}\) for all T

  3. c.

    Constant, i.e., \(O^*_{T+1}=O^*_{T}\) for all T

Proof

Let \(a^*\) be the optimal action of stage T.

Case (a):

\(O^*_T/O^*_{T-1}=1-p(a^*)+(c(a^*)-p(a^*)R)/O^*_{T-1}\)

From Lemma 1, for any a: \(0<O^*_{T-1}<c(a)/p(a)-R\), so \((c(a^*)-p(a^*)R)/O^*_{T-1}>p(a^*)\), hence: \(O^*_T/O^*_{T-1}>1\), and \(O^*_T>0\) for all T, which establishes that \(O^*_T\) is strictly increasing.

Case (b):

The demonstration that \(O^*_T/O^*_{T-1}>1\) is identical to case (a). Given that \(O^*_T<0\) for all T, then \(O^*_T\) is strictly decreasing.

Case (c) follows form the previous lemma.\(\square\)

Theorem 1

\(O^*_T\) converges to \(\min _a{c(a)/p(a)}-R\) as T goes to infinity.

Proof

Lemmas 1 and 2 imply convergence of \(O^*_T\) in cases (a) and (b). Furthermore, setting \(O_{T-1}\) to \(O_T\) in Eq. (7) results in a single fixed point \(\min _a{c(a)/p(a)}-R\), which establishes the result.

Case (c) is trivial since \(\min _a{c(a)/p(a)}-R=0\).\(\square\)

Theorem 2

If \(\varvec{\varPi }^*\) is an optimal sequence, then it is monotonic in t. In particular, \(\varvec{\varPi }^*\) is one of:

  1. a.

    Nonincreasing, i.e., \(a^*_1 \ge a^*_2 \ge \cdots \ge a^*_T\)

  2. b.

    Nondecreasing, i.e., \(a^*_1 \le a^*_2 \le \cdots \le a^*_T\)

  3. c.

    Constant, i.e., \(a^*_1= a^*_2=\cdots = a^*_T\)

Proof

Let \(a'\) be an optimal action associated with \(O^*_{T-1}\) and \(a''\) an optimal action associated with \(O^*_T\). Then:

$$\begin{aligned}(1-p(a''))O^*_T+c(a'')-p(a'')R \le (1-p(a'))O^*_T+c(a')-p(a')R \\ (p(a')-p(a''))O^*_T\le c(a')-c(a'')-R(p(a')- p(a'')) \end{aligned}$$
(15)

and

$$\begin{aligned} (1-p(a'))O^*_{T-1}+c(a')-p(a')R \le (1-p(a''))O^*_{T-1}+c(a'')-p(a'')R \\ (p(a')-p(a''))O^*_{T-1}\ge c(a')-c(a'')-R(p(a')- p(a'')) \end{aligned}$$
(16)

Combining Eqs. (15) and (16), we get:

$$\begin{aligned} (p(a')-p(a''))O^*_{T}\le (p(a')-p(a''))O^*_{T-1} \end{aligned}$$

We can conclude that:

If \(p(a')>p(a'')\): \(O^*_T\le O^*_{T-1}\) and if \(p(a')<p(a'')\): \(O^*_T\ge O^*_{T-1}\)

Assume \(a'>a''\). Then \(p(a')>p(a'')\). From the previous result, in case (a): \(O^*_T\le O^*_{T-1}\), which contradicts Lemma 2. Hence, \(a'\le a''\), which establishes that \(\varvec{\varPi }^*\) is nonincreasing.

Similarly, we can show that, in case (b), \(\varvec{\varPi }^*\) is nondecreasing.

In case (c), every step is equivalent to the single trial case, and the same action is selected at every trial, so the resulting sequence is constant.\(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baraka, K., Melo, F.S., Couto, M. et al. Optimal action sequence generation for assistive agents in fixed horizon tasks. Auton Agent Multi-Agent Syst 34, 33 (2020). https://doi.org/10.1007/s10458-020-09458-7

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s10458-020-09458-7

Keywords

Navigation