ABSTRACT
City metro network expansion, included in the transportation network design, aims to design new lines based on the existing metro network. Existing methods in the field of transportation network design either (i) can hardly formulate this problem efficiently, (ii) depend on expert guidance to produce solutions, or (iii) appeal to problem-specific heuristics which are difficult to design. To address these limitations, we propose a reinforcement learning based method for the city metro network expansion problem. In this method, we formulate the metro line expansion as a Markov decision process (MDP), which characterizes the problem as a process of sequential station selection. Then, we train an actor-critic model to design the next metro line on the basis of the existing metro network. The actor is an encoder-decoder network with an attention mechanism to generate the parameterized policy which is used to select the stations. The critic estimates the expected cumulative reward to assist the training of the actor by reducing training variance. The proposed method does not require expert guidance during design, since the learning procedure only relies on the reward calculation to tune the policy for better station selection. Also, it avoids the difficulty of heuristics designing by the policy formalizing the station selection. Considering origin-destination (OD) trips and social equity, we expand the current metro network in Xi'an, China, based on the real mobility information of 24,770,715 mobile phone users in the whole city. The results demonstrate the advantages of our method compared with existing approaches.
- Elisabete Arsenio, Karel Martens, and Floridea Di Ciommo. 2016. Sustainable urban mobility plans: Bridging climate change and equity targets? Research in Transportation Economics, Vol. 55 (2016), 30--39.Google ScholarCross Ref
- Hamid Behbahani, Sobhan Nazari, Masood Jafari Kang, and Todd Litman. 2019. A conceptual framework to formulate transportation network design problem considering social equity criteria. Transportation Research Part A: Policy and Practice, Vol. 125 (2019), 171--183.Google ScholarCross Ref
- Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).Google Scholar
- Giuseppe Bruno, Michel Gendreau, and Gilbert Laporte. 2002. A heuristic for the location of a rapid transit line. Computers & Operations Research, Vol. 29, 1 (2002), 1--12.Google ScholarCross Ref
- Partha Chakroborty. 2003. Genetic algorithms for optimal urban transit network design. Computer-Aided Civil and Infrastructure Engineering, Vol. 18, 3 (2003), 184--200.Google ScholarCross Ref
- Hélène Dufourd, Michel Gendreau, and Gilbert Laporte. 1996. Locating a transit line using tabu search. Location Science, Vol. 4, 1--2 (1996), 1--19.Google ScholarCross Ref
- Wei Fan and Randy B Machemehl. 2006. Using a simulated annealing algorithm to solve the transit route network design problem. Journal of transportation engineering, Vol. 132, 2 (2006), 122--132.Google ScholarCross Ref
- Reza Zanjirani Farahani, Elnaz Miandoabchi, Wai Yuen Szeto, and Hannaneh Rashidi. 2013. A review of urban transportation network design problems. European Journal of Operational Research, Vol. 229, 2 (2013), 281--302.Google ScholarCross Ref
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249--256.Google Scholar
- Ivo Grondman, Lucian Busoniu, Gabriel AD Lopes, and Robert Babuska. 2012. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 42, 6 (2012), 1291--1307.Google ScholarDigital Library
- Gabriel Gutiérrez-Jarpa, Gilbert Laporte, and Vladimir Marianov. 2018. Corridor-based metro network design with travel flow capture. Computers & Operations Research, Vol. 89 (2018), 58--67.Google ScholarDigital Library
- Gabriel Gutiérrez-Jarpa, Carlos Obreque, Gilbert Laporte, and Vladimir Marianov. 2013. Rapid transit network design for optimal cost and origin--destination demand capture. Computers & Operations Research, Vol. 40, 12 (2013), 3000--3009.Google ScholarDigital Library
- Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Konstantinos Kepaptsoglou and Matthew Karlaftis. 2009. Transit route network design problem. Journal of transportation engineering, Vol. 135, 8 (2009), 491--505.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.Google Scholar
- Michael Kuntz and Marco Helbich. 2014. Geostatistical mapping of real estate prices: an empirical comparison of kriging and cokriging. International Journal of Geographical Information Science, Vol. 28, 9 (2014), 1904--1921.Google ScholarDigital Library
- Gilbert Laporte and Juan A Mesa. 2015. The design of rapid transit networks. In Location science. Springer, 581--594.Google Scholar
- Gilbert Laporte, Juan A Mesa, and Francisco A Ortega. 2000. Optimization methods for the planning of rapid transit systems. European Journal of Operational Research, Vol. 122, 1 (2000), 1--10.Google ScholarCross Ref
- Gilbert Laporte, Juan A Mesa, Francisco A Ortega, and Ignacio Sevillano. 2005. Maximizing trip coverage in the location of a single rapid transit alignment. Annals of Operations Research, Vol. 136, 1 (2005), 49--63.Google ScholarCross Ref
- Gilbert Laporte and Marta MB Pascoal. 2015. Path based algorithms for metro network design. Computers & Operations Research, Vol. 62 (2015), 78--94.Google ScholarDigital Library
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
- Kevin Manaugh, Madhav G Badami, and Ahmed M El-Geneidy. 2015. Integrating social equity into urban transportation planning: A critical evaluation of equity objectives and measures in transportation plans in North America. Transport policy, Vol. 37 (2015), 167--176.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- Mahmoud Owais and Mostafa K Osman. 2018. Complete hierarchical multi-objective genetic algorithm for transit network design problem. Expert Systems with Applications, Vol. 114 (2018), 143--154.Google ScholarCross Ref
- Yanshuo Sun, Paul Schonfeld, and Qianwen Guo. 2018. Optimal extension of rail transit lines. International Journal of Sustainable Transportation, Vol. 12, 10 (2018), 753--769.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.Google Scholar
- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Advances in Neural Information Processing Systems. 2692--2700.Google Scholar
- Yi Wei, Jian Gang Jin, Jingfeng Yang, and Linjun Lu. 2019. Strategic network expansion of urban rapid transit systems: A bi-objective programming model. Computer-Aided Civil and Infrastructure Engineering, Vol. 34, 5 (2019), 431--443.Google ScholarDigital Library
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.Google ScholarDigital Library
- Zhongzhen Yang, Bin Yu, and Chuntian Cheng. 2007. A parallel ant colony algorithm for bus network optimization. Computer-Aided Civil and Infrastructure Engineering, Vol. 22, 1 (2007), 44--55.Google ScholarCross Ref
- Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, and Xing Xie. 2009. Mining individual life pattern based on location history. In 2009 tenth international conference on mobile data management: systems, services and middleware. IEEE, 1--10.Google Scholar
- Junjun Yin, Aiman Soliman, Dandong Yin, and Shaowen Wang. 2017. Depicting urban boundaries from a mobility network of spatial interactions: a case study of Great Britain with geo-located Twitter data. International Journal of Geographical Information Science, Vol. 31, 7 (2017), 1293--1313.Google ScholarDigital Library
- Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040--1048.Google ScholarDigital Library
Index Terms
- City Metro Network Expansion with Reinforcement Learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Evaluating the competitiveness of Indian metro cities: in smart city context
The eminent outgrowth rate of urban population and varying conditions need to improve public utilities and various services to its citizens in a particular city. There is a need for smarter, effective, efficient and sustainable cities in developing ...
Reinforcement learning algorithms: A brief survey
Highlights- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
AbstractReinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...
Comments