Abstract
Sets of features in Markov decision processes can play a critical role in approximately representing value and in abstracting the state space. Selection of features is crucial to the success of a system and is most often conducted by a human. We study the problem of automatically selecting problem features, and propose and evaluate a simple approach reducing the problem of selecting a new feature to standard classification learning. We learn a classifier that predicts the sign of the Bellman error over a training set of states. By iteratively adding new classifiers as features with this method, training between iterations with approximate value iteration, we find a Tetris feature set that outperforms randomly constructed features significantly, and obtains a score of about three-tenths of the highest score obtained by using a carefully hand-constructed feature set. We also show that features learned with this method outperform those learned with the previous method of Patrascu et al. [4] on the same SysAdmin domain used for evaluation there.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation – a new computational technique in dynamic programming. Math. Comp. 17(8), 155–161 (1963)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Patrascu, R., Poupart, P., Schuurmans, D., Boutilier, C., Guestrin, C.: Greedy linear value-approximation for factored markov decision processes. In: AAAI (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Sutton, R.S.: Learning to predict by the methods of temporal differences. MLJ 3, 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Tesauro, G.: Temporal difference learning and td-gammon. Comm. ACM 38(3), 58–68 (1995)
Utgoff, P.E., Precup, D.: Constuctive function approximation. In: Motoda, H., Liu, H. (eds.) Feature extraction, construction, and selection: A data-mining perspective, pp. 219–235. Kluwer, Dordrecht (1998)
Widrow, B., Hoff Jr., M.E.: Adaptive switching circuits. IRE WESCON Convention Record, 96–104 (1960)
Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Technical report, Northeastern University (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, JH., Givan, R. (2005). Feature-Discovering Approximate Value Iteration Methods. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_25
Download citation
DOI: https://doi.org/10.1007/11527862_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27872-6
Online ISBN: 978-3-540-31882-8
eBook Packages: Computer ScienceComputer Science (R0)