Skip to main content
Log in

A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

We investigate the problem of analysing the train movements in large-scale railway networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, the Penalty Costs associated to a delay, and the Train Overtaking between two trains which are in the wrong relative position on the railway network. Two main approaches exist in the literature to address these problems. One is based on the knowledge of the network and the experience of the operators. The other one is based on the analysis of the historical data about the network with advanced data analytics methods. In this paper, we will propose a hybrid approach in order to address the limitations of the current solutions. In fact, experience-based models are interpretable and robust but not really able to take into account all the factors which influence train movements resulting in low accuracy. From the other side, data-driven models are usually not easy to interpret nor robust to infrequent events and require a representative amount of data which is not always available if the phenomenon under examination changes too fast. Results on real-world data coming from the Italian railway network will show that the proposed solution outperforms both state-of-the-art experience-based and data-driven systems in terms of interpretability, robustness, ability to handle nonrecurring events and changes in the behaviour of the network, and ability to consider complex and exogenous information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In the whole paper the real names of the stations have been anonymized because of confidentiality issues.

  2. In the whole paper, the real identifiers of the trains have been anonymized because of confidentiality issues.

  3. http://www.rfi.it/rfi/SERVIZI-E-MERCATO/Accesso-alla-rete/Prospetto-informativo-della-rete.

  4. The detailed and complete structure of the top-level decision tree cannot be reported because of confidentiality issues.

  5. Google Compute https://cloud.google.com/products/.

  6. Because of confidentiality issues, we cannot report the results and the ids for all the sections and all the checkpoints available.

References

  1. Albrecht, T.: Reducing power peaks and energy consumption in rail transit systems by simultaneous train running time control. WIT Trans. State-of-the-Art Sci. Eng. 39, 3–12 (2010)

    Article  Google Scholar 

  2. Anaissi, A., Khoa, N.L.D., Wang, Y.: Automated parameter tuning in one-class support vector machine: an application for damage detection. Int. J. Data Sci. Anal. 6(4), 311–325 (2018)

    Article  Google Scholar 

  3. Badi, H., Fadhel, M., Sabry, S., Jasem, M.: Retraction note to: a survey on human–computer interaction technologies and techniques. Int. J. Data Sci. Anal. 3(2), 149–149 (2017)

    Article  Google Scholar 

  4. Barta, J., Rizzoli, A.E., Salani, M., Gambardella, L.M.: Statistical modelling of delays in a rail freight transportation network. In: Proceedings of the Winter Simulation Conference (2012)

  5. Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M.: Stochastic delay prediction in large train networks. In: OASIcs-OpenAccess Series in Informatics, vol. 20 (2011)

  6. Breiman, L.: Random forest. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  7. Brünger, O., Dahlhaus, E.: Railway Timetable and Traffic-Analysis, Modelling, Simulation. Eurail Press, Utrecht (2008)

    Google Scholar 

  8. Bryan, J., Weisbrod, G.E., Martland, C.D.: Rail Freight Solutions to Roadway Congestion: Final Report and Guidebook. Transportation Research Board, Washington, DC (2007)

    Google Scholar 

  9. Cao, L.: Data science and analytics: a new era. Int. J. Data Sci. Anal. 1(1), 1–2 (2016)

    Article  MathSciNet  Google Scholar 

  10. Daamen, W., Goverde, R.M.P., Hansen, I.A.: Non-discriminatory automatic registration of knock-on train delays. Netw. Spat. Econ. 9(1), 47–61 (2009)

    Article  Google Scholar 

  11. D’Ariano, A.: Improving Real-Time Train Dispatching: Models, Algorithms and Applications. TRAIL Research School, Netherlands (2008)

    Google Scholar 

  12. D’Ariano, A., Albrecht, T., Allan, J., Brebbia, C.A., Rumsey, A.F., Sciutto, G., Sone, S.: Running time re-optimization during real-time timetable perturbations. Timetable Plan. Inf. Qual. 1, 147–156 (2010)

    Google Scholar 

  13. D’Ariano, A., Pranzo, M.: An advanced real-time train dispatching system for minimizing the propagation of delays in a dispatching area under severe disturbances. Netw. Spat. Econ. 9(1), 63–84 (2009)

    Article  Google Scholar 

  14. D’Ariano, A., Pranzo, M., Hansen, I.A.: Conflict resolution and train speed coordination for solving real-time timetable perturbations. IEEE Trans. Intell. Transp. Syst. 8(2), 208–222 (2007)

    Article  Google Scholar 

  15. Fang, W., Yang, S., Yao, X.: A survey on problem models and solution approaches to rescheduling in railway networks. IEEE Trans. Intell. Transp. Syst. 16(6), 2997–3016 (2015)

    Article  Google Scholar 

  16. Flier, H., Gelashvili, R., Graffagnino, T., Nunkesser, M.: Mining Railway Delay Dependencies in Large-Scale Real-World Delay Data. Robust and Online Large-Scale Optimization. Springer, Berlin (2009)

    MATH  Google Scholar 

  17. Ghofrani, F., He, Q., Goverde, R.M., Liu, X.: Recent applications of big data analytics in railway transportation systems: a survey. Trans. Res. Part C Emerg. Technol. 90, 226–246 (2018)

    Article  Google Scholar 

  18. Goverde, R.M.P.: A delay propagation algorithm for large-scale railway traffic networks. Trans. Res. Part C Emerg. Technol. 18(3), 269–287 (2010)

    Article  Google Scholar 

  19. Goverde, R.M.P., Meng, L.: Advanced monitoring and management information of railway operations. J. Rail Transp. Plan. Manag. 1(2), 69–79 (2011)

    Google Scholar 

  20. Hansen, I.A., Goverde, R.M.P., Van Der Meer, D.J.: Online train delay recognition and running time prediction. In: IEEE Conference on Intelligent Transportation Systems, pp. 1783–1788 (2010)

  21. Kecman, P., Goverde, R.M.P.: Process mining of train describer event data and automatic conflict identification. Comput. Railw. XIII Comput. Sys. Des. Oper. Railw. Other Transit Syst. 127, 227 (2013)

    Google Scholar 

  22. Kecman, P., Goverde, R.M.P.: Online data-driven adaptive prediction of train event times. IEEE Trans. Intell. Transp. Syst. 16(1), 465–474 (2015)

    Article  Google Scholar 

  23. Ko, H., Koseki, T., Miyatake, M.: Application of dynamic programming to the optimization of the running profile of a train. WIT Trans. Built Environ. 74. https://doi.org/10.2495/CR040111 (2004)

  24. Kougka, G., Gounaris, A., Simitsis, A.: The many faces of data-centric workflow optimization: a survey. Int. J. Data Sci. Anal. 6(2), 81–107 (2018)

    Article  Google Scholar 

  25. Lamorgese, L., Mannino, C.: An exact decomposition approach for the real-time train dispatching problem. Oper. Res. 63(1), 48–64 (2015)

    Article  MathSciNet  Google Scholar 

  26. Lukaszewicz, P.: Energy consumption and running time for trains. Ph.D. thesis, Doctoral thesis. Railway Technology, Department of Vehicle Engineering, Royal Institute of Technology, Stockholm (2001)

  27. Lulli, A., Oneto, L., Canepa, R., Petralli, S., Anguita, D.: Large-scale railway networks train movements: a dynamic, interpretable, and robust hybrid data analytics system. In: IEEE International Conference on Data Science and Advanced Analytics (2018)

  28. Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)

    Article  Google Scholar 

  29. Marquez, F.P.G., Lewis, R.W., Tobias, A.M., Roberts, C.: Life cycle costs for railway condition monitoring. Transp. Res. Part E Logist. Transp. Rev. 44(6), 1175–1187 (2008)

    Article  Google Scholar 

  30. Milinković, S., Marković, M., Vesković, S., Ivić, M., Pavlović, N.: A fuzzy petri net model to estimate train delays. Simul. Model. Pract. Theory. 33, 144–157 (2013)

    Article  Google Scholar 

  31. Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series forecasting. Int. J. Data Sci. Anal. 3(3), 161–181 (2017)

    Article  Google Scholar 

  32. Nowakowski, T.: Analysis of modern trends of logistics technology development. Arch. Civ. Mech. Eng. 11(3), 699–706 (2011)

    Article  Google Scholar 

  33. Oneto, L.: Model selection and error estimation without the agonizing pain. WIREs Data Min. Knowl. Discov. 8(4), e1252 (2018)

    Google Scholar 

  34. Oneto, L., Fumeo, E., Clerico, C., Canepa, R., Papa, F., Dambra, C., Mazzino, N.D.A.: Dynamic delay predictions for large-scale railway networks: deep and shallow extreme learning machines tuned via thresholdout. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2754–2767 (2017)

    Article  Google Scholar 

  35. Oneto, L., Fumeo, E., Clerico, G., Canepa, R., Papa, F., Dambra, C., Mazzino, N., Anguita, D.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: IEEE International Conference on Data Science and Advanced Analytics (2016)

  36. Regione, L.: Weather Data of Regione Liguria. https://www.arpal.gov.it (2018). Accessed 14 Jan 2019

  37. Regione, L.: Weather Data of Regione Lombardia. http://www.arpalombardia.it (2018). Accessed 14 Jan 2019

  38. Regione, L.: Weather Data of Regione Piemonte. http://www.arpa.piemonte.it (2018). Accessed 14 Jan 2019

  39. Restel, F.: The Markov reliability and safety model of the railway transportation system. In: Safety and Reliability: Methodology and Applications-Proceedings of the European Safety and Reliability Conference (2014)

    Chapter  Google Scholar 

  40. Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016)

    Article  Google Scholar 

  41. Trabo, I., Landex, A., Nielsen, O.A., Schneider-Tilli, J.E.: Cost benchmarking of railway projects in Europe—can it help to reduce costs? In: International Seminar on Railway Operations Modelling and Analysis-RailCopenhagen (2013)

  42. Tsai, T.H., Lee, C.K., Wei, C.H.: Neural network based temporal feature models for short-term railway passenger demand forecasting. Exp. Syst. Appl. 36(2), 3728–3736 (2009)

    Article  Google Scholar 

  43. Wang, R., Work, D.B.: Data driven approaches for passenger train delay estimation. In: IEEE Conference on Intelligent Transportation Systems, pp. 535–540 (2015)

  44. Weihs, C., Ickstadt, K.: Data science: the impact of statistics. Int. J. Data Sci. Anal. 6(3), 189–194 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This research has been supported by the European Union through the projects IN2DREAMS (European Union’s Horizon 2020 research and innovation programme under grant agreement 777596) and In2Rail (European Union’s Horizon 2020 research and innovation programme under grant agreement 635900).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Oneto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of the DSAA’2018 Application Track paper titled “Large-Scale Railway Networks Train Movements: a Dynamic, Interpretable, and Robust Hybrid Data Analytics System” [27].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oneto, L., Buselli, I., Lulli, A. et al. A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks. Int J Data Sci Anal 9, 95–111 (2020). https://doi.org/10.1007/s41060-018-00171-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-00171-z

Keywords

Navigation