Abstract
Train delays have become a serious and common problem in the rail services due to the increasing number of passengers and limited rail network capacity, so being able to predict train delays accurately is essential for train controllers to devise appropriate plans to prevent or reduce some delays. This paper presents a machine learning ensemble framework to improve the accuracy and consistency of train delay prediction. The basic idea is to train many different types of machine learning models for each station along a chosen journey of train service using historical data and relevant weather data, and then with certain criteria to choose some models to build an ensemble. It then combines the outputs from its member models with an aggregation function to produce the final prediction. Two aggregation functions were devised to combine the outputs of individual models: averaging and weighted averaging. These ensembles were implemented with a framework and their performance was tested with the data from an intercity train service as a case study. The accuracy was measured by the percentages of correct prediction of the arrival time for a train and correct prediction within one minute to the actual arrival time. The mean accuracies and standard deviations are 42.3%(\({\pm }11.24\)) from the individual models, 57.8%(\({\pm }3.56\)) from the averaging ensembles, and 72.8%(\({\pm }0.99\)) from the weighted ensembles. For the predictions within one minute of the actual times, they are 86.4%(\({\pm }14.05\)), 94.6%(\({\pm }1.34\)) and 96.0%(\({\pm }0.47\)) respectively. So overall, the ensembles significantly improved not only the prediction accuracies but also the consistency and the weighted ensembles are clearly the best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alyahyan, S., Farrash, M., Wang, W.: Heterogeneous ensemble for imaginary scene classification. In: KDIR, pp. 197–204 (2016)
Barbour, W., Mori, J.C.M., Kuppa, S., Work, D.B.: Prediction of arrival times of freight traffic on us railroads using support vector regression. Transp. Res. Part C Emerg. Technol. 93, 211–227 (2018)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inform. Fusion 6(1), 5–20 (2005)
Corman, F., Kecman, P.: Stochastic prediction of train delays in real-time using Bayesian networks. Transp. Res. Part C Emerg. Technol. 95, 599–615 (2018)
Open Rail Data: HSP (2016). https://wiki.openraildata.com/index.php/HSP. Accessed 13 Nov 2019
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision tree ensemble: small heterogeneous is better than large homogeneous. In: 2008 Seventh International Conference on Machine Learning and Applications, pp. 900–905. IEEE (2008)
Gaurav, R., Srivastava, B.: Estimating train delays in a large rail network using a zero shot Markov model. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 1221–1226. IEEE (2018)
Hu, J., Noche, B.: Application of artificial neuron network in analysis of railway delays. Open J. Soc. Sci. 4(11), 59 (2016)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006). https://doi.org/10.1016/j.neucom.2005.12.126. http://www.sciencedirect.com/science/article/pii/S0925231206000385, Neural Networks
Lee, W.H., Yen, L.H., Chou, C.M.: A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services. Transp. Res. Part C Emerg. Technol. 73, 49–64 (2016)
Lessan, J., Fu, L., Wen, C.: A hybrid Bayesian network model for predicting delays in train operations. Comput. Ind. Eng. 127, 1214–1222 (2019)
Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)
Nair, R., et al.: An ensemble prediction model for train delays. Transp. Res. Part C Emerg. Technol. 104, 196–209 (2019)
NRE: Darwin data feeds (2018). http://www.nationalrail.co.uk/100296.aspx. Accessed Oct 2019
Oneto, L., et al.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 458–467. IEEE (2016)
Oneto, L., et al.: Train delay prediction systems: a big data analytics perspective. Big Data Res. 11, 54–64 (2018)
Office of Rail and Road: Passenger and freight rail performance 2018-19 q3 statistical release on 21/02/2019. https://dataportal.orr.gov.uk/media/1210/passenger-rail-usage-2018-19-q3.pdf. Accessed 05 Apr 2019
Smetek, M., Trawiński, B.: Selection of heterogeneous fuzzy model ensembles using self-adaptive genetic algorithms. New Gener. Comput. 29(3), 309 (2011). https://doi.org/10.1007/s00354-010-0305-3
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM (2003)
Wang, W.: Some fundamental issues in ensemble methods. In: IEEE World Congress on Computational Intelligence, pp. 2243–2250, July 2008. https://doi.org/10.1109/IJCNN.2008.4634108
Wen, C., Lessan, J., Fu, L., Huang, P., Jiang, C.: Data-driven models for predicting delay recovery in high-speed rail. In: 2017 4th International Conference on Transportation Information and Safety (ICTIS), pp. 144–151. IEEE (2017)
Yaghini, M., Khoshraftar, M.M., Seyedabadi, M.: Railway passenger train delay prediction via neural network model. J. Adv. Transport. 47(3), 355–368 (2013)
Yu, B., Yang, Z.Z., Chen, K., Yu, B.: Hybrid model for prediction of bus arrival times at next station. J. Adv. Transport. 44(3), 193–204 (2010)
Acknowledgement
The authors would like to thank Mr. Douglas Fraser in particular for his important work in gathering the data and the advice given in this research and also the WeatherQuest for providing the weather data for this project. We acknowledge the foundational works carried out by two MSc students at time, Mr. Bradley Thompson and Ms Mary Symons. In addition, we really appreciate the support and advice given by the people from the Train Operating Company - Greater Anglia, the Network Rail, the Rail Delivery Group, and the Rail Standards and Safety Board (RSSB) for the grant awarded through the rail big data sandbox competition in 2017. Specifically, we would also like to thank the Albaha University for providing a studentship for Mr Mostafa Al Ghamdi to do his PhD at the University of East Anglia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Al Ghamdi, M., Parr, G., Wang, W. (2020). Weighted Ensemble Methods for Predicting Train Delays. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12249. Springer, Cham. https://doi.org/10.1007/978-3-030-58799-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-58799-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58798-7
Online ISBN: 978-3-030-58799-4
eBook Packages: Computer ScienceComputer Science (R0)