Planning and execution through variable resolution planning
Introduction
Automated Planning (AP) is the branch of Artificial Intelligence that studies the generation of an ordered set of actions–plan–that allows a system to transit from a given initial state to a state where a set of goals have been achieved. AP has been successfully used to solve real world problems such as planning Mars exploration missions [1] or controlling underwater vehicles [2]. Despite of these examples, the application of AP systems to stochastic and dynamic environments still presents some challenges, mainly because these scenarios increase the complexity of the planning and execution process: (i) new information about the environment can be discovered during action execution, modifying the structure of the planning task; (ii) actions’ execution can fail which in turn prevents the execution of the rest of the plan; (iii) the execution of the actions in the plan can generate states from which the rest of the plan cannot be successfully executed (dead-ends); and (iv) plans may need to be generated quickly to offer a real time interaction between the AP system and the environment. For these reasons, the process of generating a plan of actions can be prohibitively expensive for this kind of scenarios.
There are two main (extreme) approaches to solve problems in stochastic and dynamic scenarios: deliberative and reactive. At one extreme, we find deliberative systems which are based on interleaving AP and execution with full or partial observability. If we have information about the dynamics of the environment (failures in the actuators of a robot, the structure of the terrain, accuracy of sensors), we can define a domain model with probabilistic information with full observability (such as in PPDDL [3] or RDDL [4]). Then, one alternative consists on building conditional plans [5] where plans take into account all possible outcomes. Another approach consists on generating a set of policies by solving the problem as a Markov Decision Process (MDP) [6], [7], [8].
But, usually, the dynamics of the environment are not known or cannot be easily modeled. Then, in turn, we have two alternatives. First, we can learn the dynamics and then apply the previous approaches. However, the learning effort is huge except for small tasks [9]. Another solution, and the most used one, consists of using a deterministic domain model and replan or repair the plan when a failure in execution is detected (e.g. the robot is not in the expected place). When replanning [10], the planner generates an initial applicable plan and executes it, one action at a time. If an unexpected state is detected, the system generates a new plan from scratch. This process is repeated until the system reaches the problem goals. Therefore, at each planning (re-planning) step, including the initial one, the system is devoting a huge computational effort on computing a valid plan (an applicable plan that achieves the goals), when most of it will not be used. When repairing a running plan [11], [12], [13], the planner generates an initial applicable plan and executes it. If an unexpected state is detected, the system generates a new plan by reusing the plan generated previously and adding/removing some actions. In general, deliberative systems require a huge computational effort to generate a complete and sound plan. Depending upon the dynamics of the environment, most probably the plan will not be executed fully.
On the other extreme, there are several approaches that solve problems in stochastic and dynamic scenarios using reactive techniques. These systems are based in greedily selecting the next action to be applied according to some knowledge which has been programmed or learned previously. If the knowledge about the environment is only used to select the next action, we can consider a pure reactive system without deliberation, where the system perceives and generates the next action in a continuous cycle. Systems based on the Subsumption architecture [14], [15] are built using a control layer set, where different layers are interconnected with signals. During each execution step, one layer is chosen depending on the information perceived. Other reactive approaches are based on building reactive behavioral navigation controllers using neural networks [16], [17] or fuzzy logic [18], [19]. In general, reactive systems require much less computational effort and are “mostly” blind with respect to the future; they usually ignore the impact of the selected action on the next actions and states. Thus, they often get trapped in local minima or dead-ends.
In this paper, we propose Variable Resolution Planning (vrp) for interleaving planning and execution in stochastic and dynamic environments. Our research has been inspired by the work of Zickler and Veloso [20], where a motion planning technique is used to generate a collision-free trajectory from an initial state to a goal state. They consider the far future with a different level of detail, by selectively ignoring the physical interactions with dynamic objects. Similarly, vrp is based on two main concepts: (i) most planning effort is devoted to compute a valid plan head of length ; and (ii) the rest of the plan is only generated by checking for potential reachability by relaxing the actions’ model. Actions are simplified by removing some domain details to decrease the computational effort avoiding dead-ends. The main advantage of our approach is that it requires much less search time than traditional planning approaches that compute a valid complete plan (improving over pure deliberative approaches), while retaining their capability of reasoning into the future (improving over pure reactive approaches). In addition, our technique can be easily parameterized by appropriately setting a value for so that its behavior gradually transits from a more deliberative approach (large values of ) to a more reactive approach (small values of ). In the extremes, if , vrp becomes an almost pure reactive system, while if , vrp behaves as a standard deliberative planner.
This paper is organized as follows: first in Section 2, we formally define the representation of the planning task in classical planning. Section 3 presents an overview of vrp. Section 4 introduces the concept of predicate abstraction and how it can be deployed in AP. Section 5 describes the algorithms related to vrp. Section 6 presents a description of the planning and execution architecture used to deploy vrp. Section 7 shows experimental evaluation of vrp in five different domains. Section 8 presents some works related with our approach. Finally, Section 9 concludes and introduces future work.
Section snippets
Planning formalization
There are different types of planning tasks defined in the literature. In this paper, we consider the sequential classical planning task which is encoded in the propositional fragment of Planning Domain Description Language (PDDL) 2.2. It includes advanced features like numeric fluents, ADL conditions, effects and derived predicates (axioms). Definition 1 Planning Task A planning task can be defined as a tuple , where: is a finite set of grounded literals (also known as facts or atoms). is a finite set of
Variable resolution planning architecture
To offer the reader an overview of the vrp’s architecture, Fig. 1 shows the architecture of vrp with its main phases and how these are connected. The vrp technique is composed of three different phases. In the first phase, called Knowledge Gathering, information about the planning task is extracted. There are different ways to extract this information, and some of these approaches are described in Section 5. The information extracted on the previous phase is used by the Abstraction Generation
Definitions
Our approach is based on removing some future details about the planning task to speed up search on hard problems, which are executed in a stochastic and/or dynamic environment. So, we have to see first how PDDL defines planning tasks. PDDL mainly uses a kind of first-order logic. The set of predicates allows to represent actions, states and goals. For instance, in order to describe the current location of a robot, we define the predicate (at). Predicate at describes that
The variable resolution planning algorithm
As we have discussed previously, predicates can be removed from the original planning task to decrease the search space and generate a new smaller abstracted search state. But, first, we need to choose what predicates will be selected (discussed in Knowledge gathering) and when they will be removed during search (discussed in Search).
Planning and execution environment
This section presents the planning and execution environment used to deploy vrp. We have implemented a planning, execution and replanning loop based on a light version of the PELEA architecture4 [32], [33] which uses the simulator MDPSim [34] to emulate the execution of plans. MPDSim executes the actions
Experimental results
This section presents the experimental results of using akfd for planning and execution. We compare its performance against the closest competitors models in terms of planning and execution systems based on PDDL and its variants:
- •
Classical planning, using planning and replanning when an execution failure is detected. We use LAMA11, an anytime planner developed within the Fast-Downward framework [27]. Once LAMA11 has found a first solution, it continues to search for better solutions until it
Related work
Abstractions and Automated Planning (AP) techniques have been combined in the literature in different ways. Our work focuses on applying abstractions over Automated Planning to decrease the computational overhead in stochastic or dynamic environments. There are three principal trends which are related to our work: (i) approaches based on generating abstraction in AP to speed up the planning process or increasing its capabilities to solve hard problems; (ii) approaches interleaving planning and
Conclusions and future work
In this paper, we have presented Variable Resolution Planning (vrp), a novel technique that uses an abstraction mechanism that dynamically removes some predicates during the planning process. Our approach is able to significantly cut down computational effort of the search process. The corresponding abstraction is only used in nodes of the search tree that are far away from the initial state of the search. The exact computation of a plan in those nodes is not crucial, given that most probably
Acknowledgments
This research has been partially supported by the Spanish MICINN projects TIN2011-27652-C03-02, TIN2012-38079-C03-02 and Comunidad de Madrid—UC3M (CCG10-UC3M/TIC-5597). The main author is supported by a Ph.D. grant from University Carlos III de Madrid. We offer our gratitude and special thanks to Francisco Javier García Polo, for his generous and invaluable comments during the revision of this paper.
Moisés Martínez obtained his Bachelor in Computer Science and Engineering from Universidad Carlos III de Madrid in 2011 and his master in Computer Science and Technology in 2010 with a specialization in artificial intelligence from the same university. He has held a research fellowship since 2011 in the Planning and Learning Group of Universidad Carlos III de Madrid, where he is currently studying his Ph.D. on artificial intelligence. He is also Professor of the bachelor in Computer Science and
References (51)
Real-time heuristic search
Artificial Intelligence
(1990)- et al.
Controlling the learning process of real-time heuristic search
Artificial Intelligence
(2003) - et al.
Lifelong planning a*
Artificial Intelligence
(2004) - et al.
MAPGEN: mixed-initiative planning and scheduling for the mars exploration rover mission
IEEE Intell. Syst.
(2004) - K. Rajan, C. McGann, F. Py, H. Thomas, Robust mission planning using deliberative autonomy for autonomous underwater...
- H.L.S. Younes, M.L. Littman, Ppddl1.0: An extension to pddl for expressing planning domains with probabilistic effects,...
- S. Sanner, Relational dynamic influence diagram language (RDDL): Language description,...
- M.A. Peot, D.E. Smith, Conditional nonlinear planning, in: M. Kaufmann (Ed.), Proceedings of the First International...
- et al.
mGPT: A probabilistic planner based on heuristic search
J. Artificial Intelligence Res.
(2005) - S.W. Yoon, A. Fern, R. Givan, S. Kambhampati, Probabilistic planning via determinization in hindsight., in: Proceedings...
Probabilistically reusing plans in deterministic planning
A robust layered control system for a mobile robot
IEEE J. Robot. Autom.
Object-oriented design of the subsumption architecture
Softw. Pract. Exp.
Real-time collision-free motion planning of a mobile robot using a neural dynamics-based approach
Int. J. Robot. Autom.
A fuzzy perceptual model for ultrasound sensors applied to intelligent navigation of mobile robots
Appl. Intell.
The fast downward planning system
J. Artificial Intelligence Res.
Cited by (7)
Adaptive and intelligent robot task planning for home service: A review
2023, Engineering Applications of Artificial IntelligenceCitation Excerpt :Determining how to make robots adapt to uncertainty and changes as well as re-plan to finish given tasks is a key issue. In many real-world environments, plans must be generated quickly, both at the beginning of execution and after each execution failure (Martinez et al., 2016). In the meantime, timeliness is critical for achieving natural and smooth interactions, especially when robots not only interact with people but also with the environment through sensors such as vision, producing vast quantities of information.
Computing Opportunities to Augment Plans for Novel Replanning during Execution
2021, Proceedings International Conference on Automated Planning and Scheduling, ICAPSRobot assistance in dynamic smart environments—a hierarchical continual planning in the now framework
2019, Sensors (Switzerland)Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition
2019, Proceedings of Machine Learning ResearchOn-Line Case-Based Policy Learning for Automated Planning in Probabilistic Environments
2018, International Journal of Information Technology and Decision Making
Moisés Martínez obtained his Bachelor in Computer Science and Engineering from Universidad Carlos III de Madrid in 2011 and his master in Computer Science and Technology in 2010 with a specialization in artificial intelligence from the same university. He has held a research fellowship since 2011 in the Planning and Learning Group of Universidad Carlos III de Madrid, where he is currently studying his Ph.D. on artificial intelligence. He is also Professor of the bachelor in Computer Science and Engineering, where he teaches in two different courses: Machine Learning and Artificial Intelligence in VideoGames. His research fields are Robotics and Automated Planning, having several papers of these topics published in conference proceedings.
Fernando Fernández is a Professor of Computer Science at Universidad Carlos III de Madrid since 2005. He received his Ph.D. degree in Computer Science from University Carlos III of Madrid (UC3M) in 2003. He received his B.Sc. in 1999 from UC3M, also in Computer Science. Since 2001, he became assistant and associate professor at UC3M. In the fall of 2000, Fernando was a visiting student at the Center for Engineering Science Advanced Research at Oak Ridge National Laboratory (Tennessee). He was also a postdoctoral fellow at the Computer Science Department of Carnegie Mellon University (visit his old web page at CMU) since October 2004 until December 2005. He is the recipient of a pre-doctoral FPU fellowship award from Spanish Ministry of Education (MEC), a Doctoral Prize from UC3M, and a MEC-Fulbright postdoctoral Fellowship. He has more than 30 journal and conference papers, mainly in the field of machine learning and planning. He has advised 3 Ph.D. thesis.
Daniel Borrajo is a Professor of Computer Science at Universidad Carlos III de Madrid since 1998. He received his Ph.D. in Computer Science in 1990 and B.S. in Computer Science both at Universidad Politcnica de Madrid. He has published over 150 journal and conference papers mainly in the fields of problem solving methods (heuristic search, automated planning and game playing) and machine learning. He has been the PI and/or participated in over 30 research projects and networks funded at all levels (regional, national and European). He has been the Program Co-chair of the International Conference of Automated Planning and Scheduling (ICAPS’13), Conference co-chair of the Symposium of Combinatorial Search (SoCS’12, SoCS’11) and ICAPS’06, Chair of the Spanish conference on AI (CAEPIA’07), PC member of conferences as IJCAI (senior PC at IJCAI’07, ’11 and ’13), AAAI, ICAPS, ICML, or ECML. He has advised 14 Ph.D. thesis, and is currently member and treasurer of the ICAPS Council.