Abstract
We investigate the problem of analysing the train movements in large-scale railway networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, the Penalty Costs associated to a delay, and the Train Overtaking between two trains which are in the wrong relative position on the railway network. Two main approaches exist in the literature to address these problems. One is based on the knowledge of the network and the experience of the operators. The other one is based on the analysis of the historical data about the network with advanced data analytics methods. In this paper, we will propose a hybrid approach in order to address the limitations of the current solutions. In fact, experience-based models are interpretable and robust but not really able to take into account all the factors which influence train movements resulting in low accuracy. From the other side, data-driven models are usually not easy to interpret nor robust to infrequent events and require a representative amount of data which is not always available if the phenomenon under examination changes too fast. Results on real-world data coming from the Italian railway network will show that the proposed solution outperforms both state-of-the-art experience-based and data-driven systems in terms of interpretability, robustness, ability to handle nonrecurring events and changes in the behaviour of the network, and ability to consider complex and exogenous information.
Similar content being viewed by others
Notes
In the whole paper the real names of the stations have been anonymized because of confidentiality issues.
In the whole paper, the real identifiers of the trains have been anonymized because of confidentiality issues.
The detailed and complete structure of the top-level decision tree cannot be reported because of confidentiality issues.
Google Compute https://cloud.google.com/products/.
Because of confidentiality issues, we cannot report the results and the ids for all the sections and all the checkpoints available.
References
Albrecht, T.: Reducing power peaks and energy consumption in rail transit systems by simultaneous train running time control. WIT Trans. State-of-the-Art Sci. Eng. 39, 3–12 (2010)
Anaissi, A., Khoa, N.L.D., Wang, Y.: Automated parameter tuning in one-class support vector machine: an application for damage detection. Int. J. Data Sci. Anal. 6(4), 311–325 (2018)
Badi, H., Fadhel, M., Sabry, S., Jasem, M.: Retraction note to: a survey on human–computer interaction technologies and techniques. Int. J. Data Sci. Anal. 3(2), 149–149 (2017)
Barta, J., Rizzoli, A.E., Salani, M., Gambardella, L.M.: Statistical modelling of delays in a rail freight transportation network. In: Proceedings of the Winter Simulation Conference (2012)
Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M.: Stochastic delay prediction in large train networks. In: OASIcs-OpenAccess Series in Informatics, vol. 20 (2011)
Breiman, L.: Random forest. Mach. Learn. 45(1), 5–32 (2001)
Brünger, O., Dahlhaus, E.: Railway Timetable and Traffic-Analysis, Modelling, Simulation. Eurail Press, Utrecht (2008)
Bryan, J., Weisbrod, G.E., Martland, C.D.: Rail Freight Solutions to Roadway Congestion: Final Report and Guidebook. Transportation Research Board, Washington, DC (2007)
Cao, L.: Data science and analytics: a new era. Int. J. Data Sci. Anal. 1(1), 1–2 (2016)
Daamen, W., Goverde, R.M.P., Hansen, I.A.: Non-discriminatory automatic registration of knock-on train delays. Netw. Spat. Econ. 9(1), 47–61 (2009)
D’Ariano, A.: Improving Real-Time Train Dispatching: Models, Algorithms and Applications. TRAIL Research School, Netherlands (2008)
D’Ariano, A., Albrecht, T., Allan, J., Brebbia, C.A., Rumsey, A.F., Sciutto, G., Sone, S.: Running time re-optimization during real-time timetable perturbations. Timetable Plan. Inf. Qual. 1, 147–156 (2010)
D’Ariano, A., Pranzo, M.: An advanced real-time train dispatching system for minimizing the propagation of delays in a dispatching area under severe disturbances. Netw. Spat. Econ. 9(1), 63–84 (2009)
D’Ariano, A., Pranzo, M., Hansen, I.A.: Conflict resolution and train speed coordination for solving real-time timetable perturbations. IEEE Trans. Intell. Transp. Syst. 8(2), 208–222 (2007)
Fang, W., Yang, S., Yao, X.: A survey on problem models and solution approaches to rescheduling in railway networks. IEEE Trans. Intell. Transp. Syst. 16(6), 2997–3016 (2015)
Flier, H., Gelashvili, R., Graffagnino, T., Nunkesser, M.: Mining Railway Delay Dependencies in Large-Scale Real-World Delay Data. Robust and Online Large-Scale Optimization. Springer, Berlin (2009)
Ghofrani, F., He, Q., Goverde, R.M., Liu, X.: Recent applications of big data analytics in railway transportation systems: a survey. Trans. Res. Part C Emerg. Technol. 90, 226–246 (2018)
Goverde, R.M.P.: A delay propagation algorithm for large-scale railway traffic networks. Trans. Res. Part C Emerg. Technol. 18(3), 269–287 (2010)
Goverde, R.M.P., Meng, L.: Advanced monitoring and management information of railway operations. J. Rail Transp. Plan. Manag. 1(2), 69–79 (2011)
Hansen, I.A., Goverde, R.M.P., Van Der Meer, D.J.: Online train delay recognition and running time prediction. In: IEEE Conference on Intelligent Transportation Systems, pp. 1783–1788 (2010)
Kecman, P., Goverde, R.M.P.: Process mining of train describer event data and automatic conflict identification. Comput. Railw. XIII Comput. Sys. Des. Oper. Railw. Other Transit Syst. 127, 227 (2013)
Kecman, P., Goverde, R.M.P.: Online data-driven adaptive prediction of train event times. IEEE Trans. Intell. Transp. Syst. 16(1), 465–474 (2015)
Ko, H., Koseki, T., Miyatake, M.: Application of dynamic programming to the optimization of the running profile of a train. WIT Trans. Built Environ. 74. https://doi.org/10.2495/CR040111 (2004)
Kougka, G., Gounaris, A., Simitsis, A.: The many faces of data-centric workflow optimization: a survey. Int. J. Data Sci. Anal. 6(2), 81–107 (2018)
Lamorgese, L., Mannino, C.: An exact decomposition approach for the real-time train dispatching problem. Oper. Res. 63(1), 48–64 (2015)
Lukaszewicz, P.: Energy consumption and running time for trains. Ph.D. thesis, Doctoral thesis. Railway Technology, Department of Vehicle Engineering, Royal Institute of Technology, Stockholm (2001)
Lulli, A., Oneto, L., Canepa, R., Petralli, S., Anguita, D.: Large-scale railway networks train movements: a dynamic, interpretable, and robust hybrid data analytics system. In: IEEE International Conference on Data Science and Advanced Analytics (2018)
Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)
Marquez, F.P.G., Lewis, R.W., Tobias, A.M., Roberts, C.: Life cycle costs for railway condition monitoring. Transp. Res. Part E Logist. Transp. Rev. 44(6), 1175–1187 (2008)
Milinković, S., Marković, M., Vesković, S., Ivić, M., Pavlović, N.: A fuzzy petri net model to estimate train delays. Simul. Model. Pract. Theory. 33, 144–157 (2013)
Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series forecasting. Int. J. Data Sci. Anal. 3(3), 161–181 (2017)
Nowakowski, T.: Analysis of modern trends of logistics technology development. Arch. Civ. Mech. Eng. 11(3), 699–706 (2011)
Oneto, L.: Model selection and error estimation without the agonizing pain. WIREs Data Min. Knowl. Discov. 8(4), e1252 (2018)
Oneto, L., Fumeo, E., Clerico, C., Canepa, R., Papa, F., Dambra, C., Mazzino, N.D.A.: Dynamic delay predictions for large-scale railway networks: deep and shallow extreme learning machines tuned via thresholdout. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2754–2767 (2017)
Oneto, L., Fumeo, E., Clerico, G., Canepa, R., Papa, F., Dambra, C., Mazzino, N., Anguita, D.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: IEEE International Conference on Data Science and Advanced Analytics (2016)
Regione, L.: Weather Data of Regione Liguria. https://www.arpal.gov.it (2018). Accessed 14 Jan 2019
Regione, L.: Weather Data of Regione Lombardia. http://www.arpalombardia.it (2018). Accessed 14 Jan 2019
Regione, L.: Weather Data of Regione Piemonte. http://www.arpa.piemonte.it (2018). Accessed 14 Jan 2019
Restel, F.: The Markov reliability and safety model of the railway transportation system. In: Safety and Reliability: Methodology and Applications-Proceedings of the European Safety and Reliability Conference (2014)
Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016)
Trabo, I., Landex, A., Nielsen, O.A., Schneider-Tilli, J.E.: Cost benchmarking of railway projects in Europe—can it help to reduce costs? In: International Seminar on Railway Operations Modelling and Analysis-RailCopenhagen (2013)
Tsai, T.H., Lee, C.K., Wei, C.H.: Neural network based temporal feature models for short-term railway passenger demand forecasting. Exp. Syst. Appl. 36(2), 3728–3736 (2009)
Wang, R., Work, D.B.: Data driven approaches for passenger train delay estimation. In: IEEE Conference on Intelligent Transportation Systems, pp. 535–540 (2015)
Weihs, C., Ickstadt, K.: Data science: the impact of statistics. Int. J. Data Sci. Anal. 6(3), 189–194 (2018)
Acknowledgements
This research has been supported by the European Union through the projects IN2DREAMS (European Union’s Horizon 2020 research and innovation programme under grant agreement 777596) and In2Rail (European Union’s Horizon 2020 research and innovation programme under grant agreement 635900).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is an extended version of the DSAA’2018 Application Track paper titled “Large-Scale Railway Networks Train Movements: a Dynamic, Interpretable, and Robust Hybrid Data Analytics System” [27].
Rights and permissions
About this article
Cite this article
Oneto, L., Buselli, I., Lulli, A. et al. A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks. Int J Data Sci Anal 9, 95–111 (2020). https://doi.org/10.1007/s41060-018-00171-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-00171-z