Maximizing Reward in a Non-Stationary Mobile Robot Environment

Goldberg, Dani; Matarić, Maja J.

doi:10.1023/A:1022935725296

Maximizing Reward in a Non-Stationary Mobile Robot Environment

Published: May 2003

Volume 6, pages 287–316, (2003)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Dani Goldberg¹ &
Maja J. Matarić²

161 Accesses
10 Citations
Explore all metrics

Abstract

The ability of a robot to improve its performance on a task can be critical, especially in poorly known and non-stationary environments where the best action or strategy is dependent upon the current state of the environment. In such systems, a good estimate of the current state of the environment is key to establishing high performance, however quantified. In this paper, we present an approach to state estimation in poorly known and non-stationary mobile robot environments, focusing on its application to a mine collection scenario, where performance is quantified using reward maximization. The approach is based on the use of augmented Markov models (AMMs), a sub-class of semi-Markov processes. We have developed an algorithm for incrementally constructing arbitrary-order AMMs on-line. It is used to capture the interaction dynamics between a robot and its environment in terms of behavior sequences executed during the performance of a task. For the purposes of reward maximization in a non-stationary environment, multiple AMMs monitor events at different timescales and provide statistics used to select the AMM likely to have a good estimate of the environmental state. AMMs with redundant or outdated information are discarded, while attempting to maintain sufficient data to reduce conformation to noise. This approach has been successfully implemented on a mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization performance criterion. We then incorporate our algorithm for state estimation using multiple AMMs, allowing the robot to select appropriate actions based on the estimated state of the environment. The approach is tested first with a physical robot, in a non-stationary environment with an abrupt change, then with a simulation, in a gradually shifting environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. C. Arkin, Behavior-Based Robotics, The MIT Press: Cambridge, Massachusetts, 1998.
Google Scholar
R. C. Arkin and K. S. Ali, “Reactive and Telerobotic Control in Multi-Agent Systems, ” in From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adpative Behavior, Brighton, England, pp. 473–478, 1994.
R. C. Arkin, T. Balch, and E. Nitz, “Communication of Behavioral State in Multi-Agent Retrieval Tasks, ” in IEEE International Conference on Robotics and Automation, Atlanta, pp. 588–594, 1993.
T. Balch, “Hierarchical Social Entropy: An Information Theoretic Measure of Robot Group Diversity, ” Autonomous Robots, vol. 8, pp. 209–237, 2000.
Google Scholar
R. D. Beer, “A Dynamical Systems Perspective on Agent-Environment Interaction, ” Artificial Intelligence, vol. 72, pp. 173–215, 1993.
Google Scholar
R. A. Brooks, “The Behavior Language: User's Guide, ” Technical Report AIM-1227, MIT AI Laboratory 1990.
R. A. Brooks, “Intelligence Without Reason, ” in Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI-91), Sydney, Australia, pp. 569–590, 1991.
C. R. Blyth, “Approximate Binomial Confidence Limits, ” Journal of the American Statistical Association, vol. 81, no.395, pp. 843–855, 1986.
Google Scholar
Y. U. Cao, A. S. Fukunaga, and A. B. Kahng, “Cooperative Mobile Robotics: Antecedents and Directions, ” Autonomous Robots, vol. 4, pp. 1–23, 1997.
Google Scholar
A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, “Acting under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation, ” in Proceedings of the 1996 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 963–972, 1996.
Google Scholar
H. Chernoff and S. Zacks, “Estimating the Current Mean of a Normal Distribution which is Subjected to Changes in Time, ” Annals of Mathematical Statistics, vol. 35, no.3, pp. 999–1018, 1964.
Google Scholar
T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill Book Company, 1990.
L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach, ” in W. Swartout (ed.), Proceedings of the 10th National Conference on Artificial Intelligence, San Jose, CA, pp. 183–188, 1992.
M. S. Fontán and M. J. Matari´c, “Territorial Multi-Robot Task Division, ” IEEE Transactions on Robotics and Automation, vol. 14, no.5, pp. 815–822, 1998.
Google Scholar
J. E. Freund, Mathematical Statistics, Fifth Edition, Prentice Hall, 1992.
E. Gat, “On Three-Layer Architectures, ” in D. Kortenkamp, R. P. Bonnasso, and R. Murphy (eds.), Artificial Intelligence and Mobile Robotics: Case Studies of Successful Robot Systems, AAAI Press, pp. 195–210, 1998.
D. Goldberg, “Evaluating the Dynamics of Agent Environment Interaction, ” Ph.D. thesis, University of Southern California, 2001.
D. Goldberg and M. J. Matarić, “Coordinating Mobile Robot Group Behavior Using a Model of Interaction Dynamics, ” in Proceedings, The Third International Conference on Autonomous Agents (Agents' 99), Seattle, Washington, pp. 100–107, 1999.
D. Goldberg and M. J. Matarić, “Detecting Regime Changes with a Mobile Robot using Multiple Models, ” in Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, Maui, Hawaii, pp. 619–624, 2001.
K. Han and M. Veloso, “Automated Robot Behavior Recognition Applied to Robotic Soccer, ” in Robotics Research: the Ninth International Symposium, Snowbird, Utah, pp. 249–256, 2000.
S. J. Hanson, “Meiosis Networks, ” in D. S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, San Mateo, CA, pp. 533–541, 1990.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains, ” Artificial Intelligence, vol. 101, no.1–2, pp. 99–134, 1998.
Google Scholar
J. G. Kemeny, J. L. Snell, and A. W. Knapp, Denumerable Markov Chains, D. Van Nostrand Company, Inc., 1966.
S. Koenig and R. G. Simmons, “Unsupervised Learning of Probabilistic Models for Robot Navigation, ” in Proceedings of the IEEE International Conference on Robotics and Automation, vol. 3, pp. 2301–2308, 1996.
Google Scholar
J. Košecká and R. Bajcsy, “Discrete Event Systems for Autonomous Mobile Agents, ” in Proceedings of the First Workshop on Intelligent Robotic Systems, Zakopane, Poland, pp. 21–31, 1993.
T. L. Lai, “Sequential Changepoint Detection in Quality Control and Dynamical Systems, ” Journal of the Royal Statistical Society, Series B (Methodological), vol. 57, no.4, pp. 613–658, 1995.
Google Scholar
R. F. Ling “A Study of the Accuracy of Some Approximations for t, χ ², and F Tail Probabilities, ” Journal of the American Statistical Association, vol. 73, no.362, pp. 274–283, 1978.
Google Scholar
S. Mahadevan and G. Theocharous, “Optimizing Production Manufacturing using Reinforcement Learning, ” in Proceedings of the Eleventh International FLAIRS Conference, Sanibel Island, Florida, pp. 372–377, 1998.
M. J. Matarić, “Behavior-Based Systems: Key Properties and Implications, ” in IEEE International Conference on Robotics and Automation, Workshop on Architectures for Intelligent Control Systems, Nice, France, pp. 46–54, 1992.
M. J. Matarić, “Issues and Approaches in the Design of Collective Autonomous Agents, ” Robotics and Autonomous Systems, vol. 16, no.2–4, pp. 321–331, 1995.
Google Scholar
M. J. Matarić, “Behavior-Based Control: Examples from Navigation, Learning, and Group Behavior, ” Journal of Experimental and Theoretical Artificial Intelligence, vol. 9, no.2–3, pp. 323–336, 1997.
Google Scholar
A. K. McCallum, “Reinforcement Learning with Selective Perception and Hidden State, ” Ph.D. thesis, University of Rochester, Department of Computer Science, 1996.
F. Michaud and M. J. Matarić, “Learning from History for Behavior-Based Mobile Robots in Nonstationary Conditions, ” Autonomous Robots, vol. 5, no.3–4, pp. 335–354, 1998.
Google Scholar
T. M. Mitchell, Machine Learning, The McGraw-Hill Companies, Inc., 1997.
L. E. Parker, “Heterogeneous Multi-Robot Cooperation, ” Ph.D. thesis, MIT, 1994.
D. B. Peizer and J. W. Pratt, “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, 1, ” Journal of the American Statistical Association, vol. 63, no.324, pp. 1416–1456, 1968.
Google Scholar
P. Pirjanian, “Multiple Objective Action Selection & Behavior Fusion using Voting, ” Ph.D. thesis, Institute of Electronic Systems, Alborg University, Denmark, 1998.
Google Scholar
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992.
L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, ” Proceedings of the IEEE, vol. 77, no.2, pp. 257–285, 1989.
Google Scholar
F. S. Roberts, Discrete Mathematical Models: With Applications to Social, Biological, and Environmental Problems, Prentice-Hall, Inc., 1976.
S. M. Ross, Applied Probability Models with Optimization Applications, New York: Dover Publications, Inc., 1992.
Google Scholar
K. Seymore, A. McCallum, and R. Rosenfeld, “Learning Hidden Markov Model Structure for Information Extraction, ” in Proceedings of the Sixteenth National Conference on Artificial Intelligence: Workshop on Machine Learning for Information Extraction, Orlando, FL, pp. 37–42, 1999.
T. Smithers, “What the Dynamics of Adaptive Behavior and Cognition Might Look Like in Agent-Environment Interaction Systems, ” in Practice and Future of Autonomous Agents, Mt. Verita, Switzerland, 1995.
A. Stolcke and S. Omohundro, “Hidden Markov Model Induction by Bayesian Model Merging, ” in S. J. Hanson, J. D. Cowan, and C. L. Giles (eds.), Advances in Neural Information Processing Systems, vol. 5. pp. 11–18, 1993.
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, ” Artificial Intelligence, vol. 112, pp. 181–211, 1999.
Google Scholar
G. Wang and S. Mahadevan, “Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes, ” in Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 464–473, 1999.
S. D. Whitehead and L.-J. Lin, “Reinforcement Learning of Non-Markov Decision Processes, ” Artificial Intelligence, vol. 73, no.1–2, pp. 271–306, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3891
Dani Goldberg
Computer Science Department, University of Southern California, Los Angeles, CA, 90089-0781
Maja J. Matarić

Authors

Dani Goldberg
View author publications
You can also search for this author in PubMed Google Scholar
Maja J. Matarić
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldberg, D., Matarić, M.J. Maximizing Reward in a Non-Stationary Mobile Robot Environment. Autonomous Agents and Multi-Agent Systems 6, 287–316 (2003). https://doi.org/10.1023/A:1022935725296

Download citation

Issue Date: May 2003
DOI: https://doi.org/10.1023/A:1022935725296

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing Reward in a Non-Stationary Mobile Robot Environment

Abstract

Access this article

Similar content being viewed by others

Planning to Chronicle

Persistent Surveillance of Events with Unknown Rate Statistics

Efficiently exploring for human robot interaction: partially observable Poisson processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Maximizing Reward in a Non-Stationary Mobile Robot Environment

Abstract

Access this article

Similar content being viewed by others

Planning to Chronicle

Persistent Surveillance of Events with Unknown Rate Statistics

Efficiently exploring for human robot interaction: partially observable Poisson processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation