Skip to main content

Weighted Ensemble Methods for Predicting Train Delays

  • Conference paper
  • First Online:
Book cover Computational Science and Its Applications – ICCSA 2020 (ICCSA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12249))

Included in the following conference series:

Abstract

Train delays have become a serious and common problem in the rail services due to the increasing number of passengers and limited rail network capacity, so being able to predict train delays accurately is essential for train controllers to devise appropriate plans to prevent or reduce some delays. This paper presents a machine learning ensemble framework to improve the accuracy and consistency of train delay prediction. The basic idea is to train many different types of machine learning models for each station along a chosen journey of train service using historical data and relevant weather data, and then with certain criteria to choose some models to build an ensemble. It then combines the outputs from its member models with an aggregation function to produce the final prediction. Two aggregation functions were devised to combine the outputs of individual models: averaging and weighted averaging. These ensembles were implemented with a framework and their performance was tested with the data from an intercity train service as a case study. The accuracy was measured by the percentages of correct prediction of the arrival time for a train and correct prediction within one minute to the actual arrival time. The mean accuracies and standard deviations are 42.3%(\({\pm }11.24\)) from the individual models, 57.8%(\({\pm }3.56\)) from the averaging ensembles, and 72.8%(\({\pm }0.99\)) from the weighted ensembles. For the predictions within one minute of the actual times, they are 86.4%(\({\pm }14.05\)), 94.6%(\({\pm }1.34\)) and 96.0%(\({\pm }0.47\)) respectively. So overall, the ensembles significantly improved not only the prediction accuracies but also the consistency and the weighted ensembles are clearly the best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alyahyan, S., Farrash, M., Wang, W.: Heterogeneous ensemble for imaginary scene classification. In: KDIR, pp. 197–204 (2016)

    Google Scholar 

  2. Barbour, W., Mori, J.C.M., Kuppa, S., Work, D.B.: Prediction of arrival times of freight traffic on us railroads using support vector regression. Transp. Res. Part C Emerg. Technol. 93, 211–227 (2018)

    Article  Google Scholar 

  3. Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inform. Fusion 6(1), 5–20 (2005)

    Article  Google Scholar 

  4. Corman, F., Kecman, P.: Stochastic prediction of train delays in real-time using Bayesian networks. Transp. Res. Part C Emerg. Technol. 95, 599–615 (2018)

    Article  Google Scholar 

  5. Open Rail Data: HSP (2016). https://wiki.openraildata.com/index.php/HSP. Accessed 13 Nov 2019

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision tree ensemble: small heterogeneous is better than large homogeneous. In: 2008 Seventh International Conference on Machine Learning and Applications, pp. 900–905. IEEE (2008)

    Google Scholar 

  8. Gaurav, R., Srivastava, B.: Estimating train delays in a large rail network using a zero shot Markov model. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 1221–1226. IEEE (2018)

    Google Scholar 

  9. Hu, J., Noche, B.: Application of artificial neuron network in analysis of railway delays. Open J. Soc. Sci. 4(11), 59 (2016)

    Google Scholar 

  10. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006). https://doi.org/10.1016/j.neucom.2005.12.126. http://www.sciencedirect.com/science/article/pii/S0925231206000385, Neural Networks

    Article  Google Scholar 

  11. Lee, W.H., Yen, L.H., Chou, C.M.: A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services. Transp. Res. Part C Emerg. Technol. 73, 49–64 (2016)

    Article  Google Scholar 

  12. Lessan, J., Fu, L., Wen, C.: A hybrid Bayesian network model for predicting delays in train operations. Comput. Ind. Eng. 127, 1214–1222 (2019)

    Article  Google Scholar 

  13. Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P.: Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 56, 251–262 (2015)

    Article  Google Scholar 

  14. Nair, R., et al.: An ensemble prediction model for train delays. Transp. Res. Part C Emerg. Technol. 104, 196–209 (2019)

    Article  Google Scholar 

  15. NRE: Darwin data feeds (2018). http://www.nationalrail.co.uk/100296.aspx. Accessed Oct 2019

  16. Oneto, L., et al.: Advanced analytics for train delay prediction systems by including exogenous weather data. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 458–467. IEEE (2016)

    Google Scholar 

  17. Oneto, L., et al.: Train delay prediction systems: a big data analytics perspective. Big Data Res. 11, 54–64 (2018)

    Article  Google Scholar 

  18. Office of Rail and Road: Passenger and freight rail performance 2018-19 q3 statistical release on 21/02/2019. https://dataportal.orr.gov.uk/media/1210/passenger-rail-usage-2018-19-q3.pdf. Accessed 05 Apr 2019

  19. Smetek, M., Trawiński, B.: Selection of heterogeneous fuzzy model ensembles using self-adaptive genetic algorithms. New Gener. Comput. 29(3), 309 (2011). https://doi.org/10.1007/s00354-010-0305-3

    Article  Google Scholar 

  20. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM (2003)

    Google Scholar 

  21. Wang, W.: Some fundamental issues in ensemble methods. In: IEEE World Congress on Computational Intelligence, pp. 2243–2250, July 2008. https://doi.org/10.1109/IJCNN.2008.4634108

  22. Wen, C., Lessan, J., Fu, L., Huang, P., Jiang, C.: Data-driven models for predicting delay recovery in high-speed rail. In: 2017 4th International Conference on Transportation Information and Safety (ICTIS), pp. 144–151. IEEE (2017)

    Google Scholar 

  23. Yaghini, M., Khoshraftar, M.M., Seyedabadi, M.: Railway passenger train delay prediction via neural network model. J. Adv. Transport. 47(3), 355–368 (2013)

    Article  Google Scholar 

  24. Yu, B., Yang, Z.Z., Chen, K., Yu, B.: Hybrid model for prediction of bus arrival times at next station. J. Adv. Transport. 44(3), 193–204 (2010)

    Article  Google Scholar 

Download references

Acknowledgement

The authors would like to thank Mr. Douglas Fraser in particular for his important work in gathering the data and the advice given in this research and also the WeatherQuest for providing the weather data for this project. We acknowledge the foundational works carried out by two MSc students at time, Mr. Bradley Thompson and Ms Mary Symons. In addition, we really appreciate the support and advice given by the people from the Train Operating Company - Greater Anglia, the Network Rail, the Rail Delivery Group, and the Rail Standards and Safety Board (RSSB) for the grant awarded through the rail big data sandbox competition in 2017. Specifically, we would also like to thank the Albaha University for providing a studentship for Mr Mostafa Al Ghamdi to do his PhD at the University of East Anglia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Al Ghamdi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al Ghamdi, M., Parr, G., Wang, W. (2020). Weighted Ensemble Methods for Predicting Train Delays. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12249. Springer, Cham. https://doi.org/10.1007/978-3-030-58799-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58799-4_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58798-7

  • Online ISBN: 978-3-030-58799-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics