Weighted Ensemble Methods for Predicting Train Delays

Al Ghamdi, Mostafa; Parr, Gerard; Wang, Wenjia

doi:10.1007/978-3-030-58799-4_43

Mostafa Al Ghamdi¹⁹,
Gerard Parr¹⁹ &
Wenjia Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12249))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1724 Accesses
1 Citations

Abstract

Train delays have become a serious and common problem in the rail services due to the increasing number of passengers and limited rail network capacity, so being able to predict train delays accurately is essential for train controllers to devise appropriate plans to prevent or reduce some delays. This paper presents a machine learning ensemble framework to improve the accuracy and consistency of train delay prediction. The basic idea is to train many different types of machine learning models for each station along a chosen journey of train service using historical data and relevant weather data, and then with certain criteria to choose some models to build an ensemble. It then combines the outputs from its member models with an aggregation function to produce the final prediction. Two aggregation functions were devised to combine the outputs of individual models: averaging and weighted averaging. These ensembles were implemented with a framework and their performance was tested with the data from an intercity train service as a case study. The accuracy was measured by the percentages of correct prediction of the arrival time for a train and correct prediction within one minute to the actual arrival time. The mean accuracies and standard deviations are 42.3%(\({\pm }11.24\)) from the individual models, 57.8%(\({\pm }3.56\)) from the averaging ensembles, and 72.8%(\({\pm }0.99\)) from the weighted ensembles. For the predictions within one minute of the actual times, they are 86.4%(\({\pm }14.05\)), 94.6%(\({\pm }1.34\)) and 96.0%(\({\pm }0.47\)) respectively. So overall, the ensembles significantly improved not only the prediction accuracies but also the consistency and the weighted ensembles are clearly the best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alyahyan, S., Farrash, M., Wang, W.: Heterogeneous ensemble for imaginary scene classification. In: KDIR, pp. 197–204 (2016)
Google Scholar
Barbour, W., Mori, J.C.M., Kuppa, S., Work, D.B.: Prediction of arrival times of freight traffic on us railroads using support vector regression. Transp. Res. Part C Emerg. Technol. 93, 211–227 (2018)
Article Google Scholar
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inform. Fusion 6(1), 5–20 (2005)
Article Google Scholar
Corman, F., Kecman, P.: Stochastic prediction of train delays in real-time using Bayesian networks. Transp. Res. Part C Emerg. Technol. 95, 599–615 (2018)
Article Google Scholar
Open Rail Data: HSP (2016). https://wiki.openraildata.com/index.php/HSP. Accessed 13 Nov 2019
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision tree ensemble: small heterogeneous is better than large homogeneous. In: 2008 Seventh International Conference on Machine Learning and Applications, pp. 900–905. IEEE (2008)
Google Scholar
Gaurav, R., Srivastava, B.: Estimating train delays in a large rail network using a zero shot Markov model. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 1221–1226. IEEE (2018)
Google Scholar
Hu, J., Noche, B.: Application of artificial neuron network in analysis of railway delays. Open J. Soc. Sci. 4(11), 59 (2016)
Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006). https://doi.org/10.1016/j.neucom.2005.12.126. http://www.sciencedirect.com/science/article/pii/S0925231206000385, Neural Networks
Article Google Scholar
Lee, W.H., Yen, L.H., Chou, C.M.: A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services. Transp. Res. Part C Emerg. Technol. 73, 49–64 (2016)
Article Google Scholar
Lessan, J., Fu, L., Wen, C.: A hybrid Bayesian network model for predicting delays in train operations. Comput. Ind. Eng. 127, 1214–1222 (2019)
Article Google Scholar
Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)
Article Google Scholar
Nair, R., et al.: An ensemble prediction model for train delays. Transp. Res. Part C Emerg. Technol. 104, 196–209 (2019)
Article Google Scholar
NRE: Darwin data feeds (2018). http://www.nationalrail.co.uk/100296.aspx. Accessed Oct 2019
Oneto, L., et al.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 458–467. IEEE (2016)
Google Scholar
Oneto, L., et al.: Train delay prediction systems: a big data analytics perspective. Big Data Res. 11, 54–64 (2018)
Article Google Scholar
Office of Rail and Road: Passenger and freight rail performance 2018-19 q3 statistical release on 21/02/2019. https://dataportal.orr.gov.uk/media/1210/passenger-rail-usage-2018-19-q3.pdf. Accessed 05 Apr 2019
Smetek, M., Trawiński, B.: Selection of heterogeneous fuzzy model ensembles using self-adaptive genetic algorithms. New Gener. Comput. 29(3), 309 (2011). https://doi.org/10.1007/s00354-010-0305-3
Article Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM (2003)
Google Scholar
Wang, W.: Some fundamental issues in ensemble methods. In: IEEE World Congress on Computational Intelligence, pp. 2243–2250, July 2008. https://doi.org/10.1109/IJCNN.2008.4634108
Wen, C., Lessan, J., Fu, L., Huang, P., Jiang, C.: Data-driven models for predicting delay recovery in high-speed rail. In: 2017 4th International Conference on Transportation Information and Safety (ICTIS), pp. 144–151. IEEE (2017)
Google Scholar
Yaghini, M., Khoshraftar, M.M., Seyedabadi, M.: Railway passenger train delay prediction via neural network model. J. Adv. Transport. 47(3), 355–368 (2013)
Article Google Scholar
Yu, B., Yang, Z.Z., Chen, K., Yu, B.: Hybrid model for prediction of bus arrival times at next station. J. Adv. Transport. 44(3), 193–204 (2010)
Article Google Scholar

Download references

Acknowledgement

The authors would like to thank Mr. Douglas Fraser in particular for his important work in gathering the data and the advice given in this research and also the WeatherQuest for providing the weather data for this project. We acknowledge the foundational works carried out by two MSc students at time, Mr. Bradley Thompson and Ms Mary Symons. In addition, we really appreciate the support and advice given by the people from the Train Operating Company - Greater Anglia, the Network Rail, the Rail Delivery Group, and the Rail Standards and Safety Board (RSSB) for the grant awarded through the rail big data sandbox competition in 2017. Specifically, we would also like to thank the Albaha University for providing a studentship for Mr Mostafa Al Ghamdi to do his PhD at the University of East Anglia.

Author information

Authors and Affiliations

School of Computing Sciences, University of East Anglia, Norwich, UK
Mostafa Al Ghamdi, Gerard Parr & Wenjia Wang

Authors

Mostafa Al Ghamdi
View author publications
You can also search for this author in PubMed Google Scholar
Gerard Parr
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mostafa Al Ghamdi .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Chair- Center of ICT/ICE, Covenant University, Ota, Nigeria
Sanjay Misra
University of Cagliari, Cagliari, Italy
Chiara Garau
University of Cagliari, Cagliari, Italy
Ivan Blečić
Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
David Taniar
Department of Information Science, Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
Carmelo Maria Torre
Department of Neurology, University of Massachusetts Medical School, Worcester, MA, USA
Yeliz Karaca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al Ghamdi, M., Parr, G., Wang, W. (2020). Weighted Ensemble Methods for Predicting Train Delays. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12249. Springer, Cham. https://doi.org/10.1007/978-3-030-58799-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-58799-4_43
Published: 01 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58798-7
Online ISBN: 978-3-030-58799-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics