Skip to main content
Log in

Maximizing Reward in a Non-Stationary Mobile Robot Environment

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The ability of a robot to improve its performance on a task can be critical, especially in poorly known and non-stationary environments where the best action or strategy is dependent upon the current state of the environment. In such systems, a good estimate of the current state of the environment is key to establishing high performance, however quantified. In this paper, we present an approach to state estimation in poorly known and non-stationary mobile robot environments, focusing on its application to a mine collection scenario, where performance is quantified using reward maximization. The approach is based on the use of augmented Markov models (AMMs), a sub-class of semi-Markov processes. We have developed an algorithm for incrementally constructing arbitrary-order AMMs on-line. It is used to capture the interaction dynamics between a robot and its environment in terms of behavior sequences executed during the performance of a task. For the purposes of reward maximization in a non-stationary environment, multiple AMMs monitor events at different timescales and provide statistics used to select the AMM likely to have a good estimate of the environmental state. AMMs with redundant or outdated information are discarded, while attempting to maintain sufficient data to reduce conformation to noise. This approach has been successfully implemented on a mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization performance criterion. We then incorporate our algorithm for state estimation using multiple AMMs, allowing the robot to select appropriate actions based on the estimated state of the environment. The approach is tested first with a physical robot, in a non-stationary environment with an abrupt change, then with a simulation, in a gradually shifting environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. C. Arkin, Behavior-Based Robotics, The MIT Press: Cambridge, Massachusetts, 1998.

    Google Scholar 

  2. R. C. Arkin and K. S. Ali, “Reactive and Telerobotic Control in Multi-Agent Systems, ” in From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adpative Behavior, Brighton, England, pp. 473–478, 1994.

  3. R. C. Arkin, T. Balch, and E. Nitz, “Communication of Behavioral State in Multi-Agent Retrieval Tasks, ” in IEEE International Conference on Robotics and Automation, Atlanta, pp. 588–594, 1993.

  4. T. Balch, “Hierarchical Social Entropy: An Information Theoretic Measure of Robot Group Diversity, ” Autonomous Robots, vol. 8, pp. 209–237, 2000.

    Google Scholar 

  5. R. D. Beer, “A Dynamical Systems Perspective on Agent-Environment Interaction, ” Artificial Intelligence, vol. 72, pp. 173–215, 1993.

    Google Scholar 

  6. R. A. Brooks, “The Behavior Language: User's Guide, ” Technical Report AIM-1227, MIT AI Laboratory 1990.

  7. R. A. Brooks, “Intelligence Without Reason, ” in Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI-91), Sydney, Australia, pp. 569–590, 1991.

  8. C. R. Blyth, “Approximate Binomial Confidence Limits, ” Journal of the American Statistical Association, vol. 81, no.395, pp. 843–855, 1986.

    Google Scholar 

  9. Y. U. Cao, A. S. Fukunaga, and A. B. Kahng, “Cooperative Mobile Robotics: Antecedents and Directions, ” Autonomous Robots, vol. 4, pp. 1–23, 1997.

    Google Scholar 

  10. A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, “Acting under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation, ” in Proceedings of the 1996 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 963–972, 1996.

    Google Scholar 

  11. H. Chernoff and S. Zacks, “Estimating the Current Mean of a Normal Distribution which is Subjected to Changes in Time, ” Annals of Mathematical Statistics, vol. 35, no.3, pp. 999–1018, 1964.

    Google Scholar 

  12. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill Book Company, 1990.

  13. L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach, ” in W. Swartout (ed.), Proceedings of the 10th National Conference on Artificial Intelligence, San Jose, CA, pp. 183–188, 1992.

  14. M. S. Fontán and M. J. Matari´c, “Territorial Multi-Robot Task Division, ” IEEE Transactions on Robotics and Automation, vol. 14, no.5, pp. 815–822, 1998.

    Google Scholar 

  15. J. E. Freund, Mathematical Statistics, Fifth Edition, Prentice Hall, 1992.

  16. E. Gat, “On Three-Layer Architectures, ” in D. Kortenkamp, R. P. Bonnasso, and R. Murphy (eds.), Artificial Intelligence and Mobile Robotics: Case Studies of Successful Robot Systems, AAAI Press, pp. 195–210, 1998.

  17. D. Goldberg, “Evaluating the Dynamics of Agent Environment Interaction, ” Ph.D. thesis, University of Southern California, 2001.

  18. D. Goldberg and M. J. Matarić, “Coordinating Mobile Robot Group Behavior Using a Model of Interaction Dynamics, ” in Proceedings, The Third International Conference on Autonomous Agents (Agents' 99), Seattle, Washington, pp. 100–107, 1999.

  19. D. Goldberg and M. J. Matarić, “Detecting Regime Changes with a Mobile Robot using Multiple Models, ” in Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, Maui, Hawaii, pp. 619–624, 2001.

  20. K. Han and M. Veloso, “Automated Robot Behavior Recognition Applied to Robotic Soccer, ” in Robotics Research: the Ninth International Symposium, Snowbird, Utah, pp. 249–256, 2000.

  21. S. J. Hanson, “Meiosis Networks, ” in D. S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, San Mateo, CA, pp. 533–541, 1990.

  22. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains, ” Artificial Intelligence, vol. 101, no.1–2, pp. 99–134, 1998.

    Google Scholar 

  23. J. G. Kemeny, J. L. Snell, and A. W. Knapp, Denumerable Markov Chains, D. Van Nostrand Company, Inc., 1966.

  24. S. Koenig and R. G. Simmons, “Unsupervised Learning of Probabilistic Models for Robot Navigation, ” in Proceedings of the IEEE International Conference on Robotics and Automation, vol. 3, pp. 2301–2308, 1996.

    Google Scholar 

  25. J. Košecká and R. Bajcsy, “Discrete Event Systems for Autonomous Mobile Agents, ” in Proceedings of the First Workshop on Intelligent Robotic Systems, Zakopane, Poland, pp. 21–31, 1993.

  26. T. L. Lai, “Sequential Changepoint Detection in Quality Control and Dynamical Systems, ” Journal of the Royal Statistical Society, Series B (Methodological), vol. 57, no.4, pp. 613–658, 1995.

    Google Scholar 

  27. R. F. Ling “A Study of the Accuracy of Some Approximations for t, χ 2, and F Tail Probabilities, ” Journal of the American Statistical Association, vol. 73, no.362, pp. 274–283, 1978.

    Google Scholar 

  28. S. Mahadevan and G. Theocharous, “Optimizing Production Manufacturing using Reinforcement Learning, ” in Proceedings of the Eleventh International FLAIRS Conference, Sanibel Island, Florida, pp. 372–377, 1998.

  29. M. J. Matarić, “Behavior-Based Systems: Key Properties and Implications, ” in IEEE International Conference on Robotics and Automation, Workshop on Architectures for Intelligent Control Systems, Nice, France, pp. 46–54, 1992.

  30. M. J. Matarić, “Issues and Approaches in the Design of Collective Autonomous Agents, ” Robotics and Autonomous Systems, vol. 16, no.2–4, pp. 321–331, 1995.

    Google Scholar 

  31. M. J. Matarić, “Behavior-Based Control: Examples from Navigation, Learning, and Group Behavior, ” Journal of Experimental and Theoretical Artificial Intelligence, vol. 9, no.2–3, pp. 323–336, 1997.

    Google Scholar 

  32. A. K. McCallum, “Reinforcement Learning with Selective Perception and Hidden State, ” Ph.D. thesis, University of Rochester, Department of Computer Science, 1996.

  33. F. Michaud and M. J. Matarić, “Learning from History for Behavior-Based Mobile Robots in Nonstationary Conditions, ” Autonomous Robots, vol. 5, no.3–4, pp. 335–354, 1998.

    Google Scholar 

  34. T. M. Mitchell, Machine Learning, The McGraw-Hill Companies, Inc., 1997.

  35. L. E. Parker, “Heterogeneous Multi-Robot Cooperation, ” Ph.D. thesis, MIT, 1994.

  36. D. B. Peizer and J. W. Pratt, “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, 1, ” Journal of the American Statistical Association, vol. 63, no.324, pp. 1416–1456, 1968.

    Google Scholar 

  37. P. Pirjanian, “Multiple Objective Action Selection & Behavior Fusion using Voting, ” Ph.D. thesis, Institute of Electronic Systems, Alborg University, Denmark, 1998.

    Google Scholar 

  38. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992.

  39. L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, ” Proceedings of the IEEE, vol. 77, no.2, pp. 257–285, 1989.

    Google Scholar 

  40. F. S. Roberts, Discrete Mathematical Models: With Applications to Social, Biological, and Environmental Problems, Prentice-Hall, Inc., 1976.

  41. S. M. Ross, Applied Probability Models with Optimization Applications, New York: Dover Publications, Inc., 1992.

    Google Scholar 

  42. K. Seymore, A. McCallum, and R. Rosenfeld, “Learning Hidden Markov Model Structure for Information Extraction, ” in Proceedings of the Sixteenth National Conference on Artificial Intelligence: Workshop on Machine Learning for Information Extraction, Orlando, FL, pp. 37–42, 1999.

  43. T. Smithers, “What the Dynamics of Adaptive Behavior and Cognition Might Look Like in Agent-Environment Interaction Systems, ” in Practice and Future of Autonomous Agents, Mt. Verita, Switzerland, 1995.

  44. A. Stolcke and S. Omohundro, “Hidden Markov Model Induction by Bayesian Model Merging, ” in S. J. Hanson, J. D. Cowan, and C. L. Giles (eds.), Advances in Neural Information Processing Systems, vol. 5. pp. 11–18, 1993.

  45. R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, ” Artificial Intelligence, vol. 112, pp. 181–211, 1999.

    Google Scholar 

  46. G. Wang and S. Mahadevan, “Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes, ” in Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 464–473, 1999.

  47. S. D. Whitehead and L.-J. Lin, “Reinforcement Learning of Non-Markov Decision Processes, ” Artificial Intelligence, vol. 73, no.1–2, pp. 271–306, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldberg, D., Matarić, M.J. Maximizing Reward in a Non-Stationary Mobile Robot Environment. Autonomous Agents and Multi-Agent Systems 6, 287–316 (2003). https://doi.org/10.1023/A:1022935725296

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022935725296

Navigation