ABSTRACT
Tangled Program Graphs (TPG) is a framework for genetic programming which has shown promise in challenging reinforcement learning problems with discrete action spaces. The approach has recently been extended to incorporate temporal memory mechanisms that enable operation in environments with partial-observability at multiple timescales. Here we propose a highly-modular memory structure that manages temporal properties of a task and enables operation in problems with continuous action spaces. This significantly broadens the scope of real-world applications for TPGs, from continuous-action reinforcement learning to time series forecasting. We begin by testing the new algorithm on a suite of symbolic regression benchmarks. Next, we evaluate the method in 3 challenging time series forecasting problems. Results generally match the quality of state-of-the-art solutions in both domains. In the case of time series prediction, we show that temporal memory eliminates the need to pre-specify a fixed-size sliding window of previous values, or autoregressive state, which is used by all compared methods. This is significant because it implies that no prior model for a time series is necessary, and the forecaster may adapt more easily if the properties of a series change significantly over time.
- Alexandros Agapitos, Michael O'Neill, and Anthony Brabazon. 2012. Genetic Programming for the Induction of Seasonal Forecasts: A Study on Weather Derivatives. In Financial Decision Making Using Computational Intelligence, Michael Doumpos, Constantin Zopounidis, and Panos M. Pardalos (Eds.). Springer US, Boston, MA, 159--188.Google Scholar
- Timothy Atkinson, Detlef Plump, and Susan Stepney. 2019. Evolving Graphs with Horizontal Gene Transfer. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '19). Association for Computing Machinery, New York, NY, USA, 968--976.Google ScholarDigital Library
- Andrea Banino, Adria Puigdomenech Badia, Raphael Koster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, and Charles Blundell. 2020. MEMO: A Deep Network for Flexible Combination of Episodic Memories. (2020). arXiv:cs.LG/2001.10913Google Scholar
- Markus Brameier and Wolfgang Banzhaf. 2007. Linear Genetic Programming. Springer.Google Scholar
- Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, and Changjie Fan. 2019. Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces. (2019). arXiv:cs.LG/1903.04959Google Scholar
- Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2019. Recurrent Independent Mechanisms. (2019). arXiv:cs.LG/1909.10893Google Scholar
- Malcolm I. Heywood. 2015. Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genetic Programming and Evolvable Machines 16, 3 (2015), 283--326.Google ScholarDigital Library
- U. Hübner, N. B. Abraham, and C. O. Weiss. 1989. Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser. Phys. Rev. A 40 (1989), 6354--6365.Google ScholarCross Ref
- Stephen Kelly. 2018. Scaling Genetic Programming to Challenging Reinforcement Tasks through Emergent Modularity. Ph.D. Dissertation. Faculty of Computer Science, Dalhousie University.Google Scholar
- Stephen Kelly. 2020. TPG Source Code. http://stephenkelly.ca/?q=research.Google Scholar
- Stephen Kelly and Wolfgang Banzhaf. 2020. Temporal Memory Sharing in Visual Reinforcement Learning. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Lee Spector, and Leigh Sheneman (Eds.). Springer International Publishing, Cham, 101--119.Google Scholar
- Stephen Kelly and Malcolm I. Heywood. 2018. Discovering Agent Behaviors Through Code Reuse: Examples From Half-Field Offense and Ms. Pac-Man. IEEE Transactions on Games 10, 2 (June 2018), 195--208.Google Scholar
- Stephen Kelly and Malcolm I. Heywood. 2018. Emergent Solutions to High-Dimensional Multitask Reinforcement Learning. Evolutionary Computation 26, 3 (2018), 347--380.Google ScholarDigital Library
- Stephen Kelly, Robert J. Smith, and Malcolm I. Heywood. 2019. Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial. In Genetic Programming Theory and Practice XVI, Wolfgang Banzhaf, Lee Spector, and Leigh Sheneman (Eds.). Springer International Publishing, Cham, 37--57.Google Scholar
- John F. C. Kingman. 1978. A simple model for the balance between selection and mutation. Journal of Applied Probability 15, 1 (1978), 1--12.Google ScholarCross Ref
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521--3526.Google ScholarCross Ref
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. (2015). arXiv:cs.LG/1509.02971Google Scholar
- Michael C. Mackey and Leon Glass. 1977. Oscillation and chaos in physiological control systems. Science 197, 4300 (1977), 287--289.Google ScholarCross Ref
- Luke Metz, Julian Ibarz, Navdeep Jaitly, and James Davidson. 2017. Discrete Sequential Prediction of Continuous Actions for Deep RL. (2017). arXiv:cs.LG/1705.05035Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google Scholar
- Aurora M. Nedelcu and Richard E. Michod. 2002. Evolvability, modularity, and individuality during the transition to multicellularity in volvocalean green algae. In In Modularity in development and evolution, Wagner G. Schlosser G. (Ed.). Chicago Press, 470--489.Google Scholar
- Miguel Nicolau, Alexandros Agapitos, Michael O'Neill, and Anthony Brabazon. 2015. Guidelines for defining benchmark problems in Genetic Programming. In 2015 IEEE Congress on Evolutionary Computation (CEC). 1152--1159.Google ScholarCross Ref
- Richard J. Preen and Larry Bull. 2013. Dynamical Genetic Programming in Xcsf. Evolutionary Computation 21, 3 (Sept. 2013), 361--387.Google ScholarDigital Library
- Benjamin Recht. 2019. A Tour of Reinforcement Learning: The View from Continuous Control. Annual Review of Control, Robotics, and Autonomous Systems 2, 1 (2019), 253--279.Google ScholarCross Ref
- Herbert A. Simon. 1962. The Architecture of Complexity. Proceedings of the American Philosophical Society 106 (01 1962), 467--482.Google Scholar
- Robert J. Smith and Malcolm I. Heywood. 2019. Evolving Dota 2 Shadow Fiend Bots Using Genetic Programming with External Memory. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '19). Association for Computing Machinery, New York, NY, USA, 179--187.Google Scholar
- Robert J. Smith and Malcolm I. Heywood. 2019. A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks. In Genetic Programming, Lukas Sekanina, Ting Hu, Nuno Lourenço, Hendrik Richter, and Pablo García-Sánchez (Eds.). Springer International Publishing, Cham, 162--177.Google Scholar
- SIDC Team. 2020 (accessed December, 2019). World Data Center for the production, preservation and dissemination of the international sunspot number. http://sidc.be/silso/home.Google Scholar
- Andrew James Turner and Julian Francis Miller. 2017. Recurrent Cartesian Genetic Programming of Artificial Neural Networks. Genetic Programming and Evolvable Machines 18, 2 (June 2017), 185--212.Google ScholarDigital Library
- Günter P. Wagner and Lee Altenberg. 1996. Perspective: Complex Adaptations and the Evolution of Evolvability. Evolution 50, 3 (1996), 967--976.Google ScholarCross Ref
- Neal Wagner, Zbigniew Michalewicz, Moutaz Khouja, and Rob R. McGregor. 2007. Time Series Forecasting for Dynamic Environments: The DyFor Genetic Program Model. IEEE Transactions on Evolutionary Computation 11, 4 (Aug 2007), 433--452.Google ScholarDigital Library
- Richard A. Watson and Jordan B. Pollack. 2005. Modular interdependency in complex dynamical systems. Artificial Life 11, 4 (2005), 445--457.Google ScholarDigital Library
- Ian Whalen, Wolfgang Banzhaf, Hawlader Abdullah, and Cedric Gondro. 2020. Evolving SNP Panels for Genomic Prediction. In Evolution in Action: Past, Present and Future - A Festschrift in Honor of Erik D. Goodman, Wolfgang Banzhaf, Betty H.C. Cheng, Kalyanmoy Deb, Kay E. Holekamp, Richard E. Lenski, Charles Ofria, Robert T. Pennock, William F. Punch, and Danielle J. Whittaker (Eds.). Springer, Cham, Switzerland, 465--485.Google Scholar
- Andrew S. Yang. 2001. Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects. Evolution and Development 3, 2 (2001), 59--72.Google ScholarCross Ref
- Maoxin Yang, Qinghua Hu, and Yun Wang. 2019. Multi-task Learning Method for Hierarchical Time Series Forecasting. In Artificial Neural Networks and Machine Learning - ICANN 2019: Text and Time Series, Igor V. Tetko, Věra Kůrková, Pavel Karpov, and Fabian Theis (Eds.). Springer International Publishing, Cham, 474--485.Google Scholar
Index Terms
- A modular memory framework for time series prediction
Recommendations
Prediction of rainfall time series using modular soft computingmethods
In this paper, several soft computing approaches were employed for rainfall prediction. Two aspects were considered to improve the accuracy of rainfall prediction: (1)carrying out a data-preprocessing procedure and (2)adopting a modular modeling method. ...
Long memory time series forecasting by using genetic programming
Real-world time series have certain properties, such as stationarity, seasonality, linearity, among others, which determine their underlying behaviour. There is a particular class of time series called long-memory processes, characterized by a ...
Defining and applying prediction performance metrics on a recurrent NARX time series model
Nonlinear autoregressive moving average with exogenous inputs (NARMAX) models have been successfully demonstrated for modeling the input-output behavior of many complex systems. This paper deals with the proposition of a scheme to provide time series ...
Comments