Skip to main content
Log in

Multistate time series imputation using generative adversarial network with applications to traffic data

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Time series missing data is a pervasive problem in many fields, especially in intelligent transportation system, which hinders the application of timing analysis methods and the fine adjustment of control strategies. The prevalent imputation approaches reconstruct missing data with a high accuracy by exploiting a precise distribution model. But the multistate characteristic of time series data and the uncertainty of imputation process increase the difficulty of modeling temporal data distribution and reduce the imputation performance. In this paper, a novel time series generative adversarial imputation network (TGAIN) model is proposed to deal with time series data missing problem. The model combines the advantages of GAN's data distribution modeling and multiple imputation's uncertainty handling. Specifically, the TGAIN network is designed and adversarial trained to learn the multistate distribution of missing time series data. Through the conditional vector constraint and adversarial imputation process, the latent distribution for each missing position under different states can be effectively estimated based on implicit relationships with partial observation information. Then the corresponding multiple imputation strategy is proposed to deal with the uncertainty of imputation process and it can determine the best fill value from the learned distribution. Furthermore, sufficient experiments have been conducted in two real traffic flow datasets. The comparative results show the proposed TGAIN not only has better ability on time series data distribution modeling and imputation uncertainty handling, but also performs more robustly and stability even with the missing rate increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data availability

All datasets and code supporting the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Li Z, Cao Q, Zhao Y et al (2018) Signal cooperative control with traffic supply and demand on a single intersection. IEEE Access 6:54407–54416. https://doi.org/10.1109/ACCESS.2018.2870172

    Article  Google Scholar 

  2. Qu Z, Li H, Li Z et al (2020) Short-term traffic flow forecasting method with M-B-LSTM hybrid network. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3009725.Accessed29July

    Article  Google Scholar 

  3. Kalair K, Connaughton C (2021) Anomaly detection and classification in traffic flow data from fluctuations in the flow-density relationship. Transp Res Pt C-Emerg Technol 127:103178. https://doi.org/10.1016/j.trc.2021.103178

    Article  Google Scholar 

  4. Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Syst 37(5):692–709. https://doi.org/10.1109/TSMCA.2007.902631

    Article  Google Scholar 

  5. Guo Z, Wang Y, Ye H (2019) A data imputation method for multivariate time series based on generative adversarial network. Neurocomputing 360:185–197. https://doi.org/10.1016/j.neucom.2019.06.007

    Article  Google Scholar 

  6. García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282. https://doi.org/10.1007/s00521-009-0295-6

    Article  Google Scholar 

  7. García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR et al (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493. https://doi.org/10.1016/j.neucom.2008.11.026

    Article  Google Scholar 

  8. Zhang S (2012) Nearest neighbor selection for iteratively KNN imputation. J Syst Softw 85(11):2541–2552. https://doi.org/10.1016/j.jss.2012.05.073

    Article  Google Scholar 

  9. Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198. https://doi.org/10.1093/bioinformatics/bth499

    Article  Google Scholar 

  10. Yu Z, Li T, Horng SJ et al (2017) An iterative locally auto-weighted least squares method for microarray missing value estimation. IEEE Trans Nanobiosci 16(1):21–33. https://doi.org/10.1109/TNB.2016.2636243

    Article  Google Scholar 

  11. Buza K, Nanopoulosb A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowledge-Based Syst 86:250–260. https://doi.org/10.1016/j.knosys.2015.06.010

    Article  Google Scholar 

  12. Wang G, Lu J, Choi KS et al (2020) A transfer-based additive LS-SVM classifier for handling missing data. IEEE T Cybern 50(2):739–752. https://doi.org/10.1109/TCYB.2018.2872800

    Article  Google Scholar 

  13. Razzaghi T, Roderick O, Safro I et al (2016) Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5):e0155119. https://doi.org/10.1371/journal.pone.0155119

    Article  Google Scholar 

  14. Qu L, Li L, Zhang Y et al (2009) PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Trans Intell Transp Syst 10(3):512–522. https://doi.org/10.1109/TITS.2009.2026312

    Article  Google Scholar 

  15. Folch-Fortuny A, Arteaga F, Ferrer A (2015) PCA model building with missing data: new proposals and a comparative study. Chemometrics Intell Lab Syst 146:77–88. https://doi.org/10.1016/j.chemolab.2015.05.006

    Article  Google Scholar 

  16. Yuan X, Han L, Qian S et al (2019) Singular value decomposition based recommendation using imputed data. Knowledge-Based Syst 163:485–494. https://doi.org/10.1016/j.knosys.2018.09.011

    Article  Google Scholar 

  17. Chen X, He Z, Wang J (2018) Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transp Res Pt C-Emerg Technol 86(2018):59–77. https://doi.org/10.1016/j.trc.2017.10.023

    Article  Google Scholar 

  18. Asif MT, Mitrovic N, Garg L et al (2013) Low-dimensional models for missing data imputation in road networks. In: EEE international conference on acoustics, speech and signal processing. IEEE, pp. 3527–3531

  19. Chen X, Wei Z, Li Z et al (2017) Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowl-Based Syst 132:249–262. https://doi.org/10.1016/j.knosys.2017.06.010

    Article  Google Scholar 

  20. Chen X, Cai Y, Ye Q et al (2018) Graph regularized local self-representation for missing value imputation with applications to on-road traffic sensor data. Neurocomputing 303:47–59. https://doi.org/10.1016/j.neucom.2018.04.029

    Article  Google Scholar 

  21. Chen X, Cai Y, Liu Q et al (2018) Nonconvex l(p)-Norm regularized sparese self-representation for traffic sensor data recovery. IEEE Access 6:24279–24290. https://doi.org/10.1109/ACCESS.2018.2832043

    Article  Google Scholar 

  22. Harel O, Zhou XH (2007) Multiple imputation: review of theory, implementation and software. Stat Med 26(16):3057–3077. https://doi.org/10.1002/sim.2787

    Article  MathSciNet  Google Scholar 

  23. Murray JS (2018) Multiple imputation: a review of practical and theoretical findings. Stat Sci 33(2):142–159. https://doi.org/10.1214/18-STS644

    Article  MathSciNet  MATH  Google Scholar 

  24. Gondara L, Wang L (2018) Mida: multiple imputation using denoising autoencoders. Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 260–272

    Chapter  Google Scholar 

  25. Enders CK, Mistler SA, Keller BT (2016) Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods 21(2):222–240. https://doi.org/10.1037/met0000063

    Article  Google Scholar 

  26. Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680

  27. Arjovsky M, Chintala S, Bottou L, (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, pp. 214–223

  28. Xu S, Zhu Q, Wang J (2020) Generative image completion with image-to-image translation. Neural Comput Appl 32(11):7333–7345. https://doi.org/10.1007/s00521-019-04253-2

    Article  Google Scholar 

  29. Yang Y, Wang L, Xie D et al (2021) Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis. IEEE Trans Image Process 30:2798–2809. https://doi.org/10.1109/TIP.2021.3055062

    Article  Google Scholar 

  30. Yoon J, Jordon J, Schaar M (2018) GAIN: missing data imputation using generative adversarial nets. In: International conference on machine learning, pp. 5675–5684

  31. Luo Y, Cai X, Zhang Y, et al (2018) Multivariate time series imputation with generative adversarial networks. in: 32nd conference on neural information processing systems (NIPS), 2018, vol.31

  32. Shang C, Palmer A, Sun J et al. (2017) VIGAN: missing view imputation with generative adversarial networks. In: 2017 IEEE International conference on big data (Big Data), pp. 766–775

  33. Lee D, Kim J, Moon W J et al. (2019) CollaGAN: collaborative GAN for missing image data imputation. In: IEEE/CVF conference on computer vision and pattern recognition, pp: 2487–2496

  34. Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res 33(4):545–571. https://doi.org/10.1207/s15327906mbr3304_5

    Article  Google Scholar 

  35. Ni D, Leonard JD (2005) Markov chain monte carlo multiple imputation using bayesian networks for incomplete intelligent transportation systems data, Transp. Res. Record. In: 84th annual meeting of the transportation-research-board. 1935(1):57–67

  36. Nielsen SF (2003) Proper and improper multiple imputation. Int Stat Rev 71(3):593–607

    Article  MATH  Google Scholar 

  37. Li D, Li L, Li X et al (2020) Smoothed LSTM-AE: a spatio-temporal deep model for multiple time-series missing imputation. Neurocomputing 411:351–363. https://doi.org/10.1016/j.neucom.2020.05.033

    Article  Google Scholar 

  38. Zhu J, Raghunathan TE (2015) Convergence properties of a sequential regression multiple imputation algorithm. J Am Stat Assoc 110(511):1112–1124. https://doi.org/10.1080/01621459.2014.948117

    Article  MathSciNet  MATH  Google Scholar 

  39. Yu L, Zhou R, Chen R et al (2022) Missing data preprocessing in credit classification: one-hot encoding or imputation? Emerg Mark Financ Trade 58(2):472–482

    Article  Google Scholar 

  40. Li M, Zhang T, Chen Y et al. (2014) Efficient mini-batch training for stochastic optimization. In: 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp: 661–670

  41. Kong QJ, Zhao Q, Wei C et al (2013) Efficient traffic state estimation for large-scale urban road networks. IEEE Trans Intell Transp Syst 14(1):398–407. https://doi.org/10.1109/TITS.2012.2218237

    Article  Google Scholar 

  42. Li SCX, Jiang B, Marlin B (2019) MisGAN: Learning from incomplete data with generative adversarial networks. In: International conference on learning representations

  43. Fan J, Chow TWS (2017) Matrix completion by least-square, low-rank, and sparse self-representations. Pattern Recognit 71:290–305. https://doi.org/10.1016/j.patcog.2017.05.013

    Article  Google Scholar 

  44. Gao S, Zhou M, Wang Y et al (2019) Dendritic neuron model with effective learning algorithms for classification, approximation and prediction. IEEE Trans. Neural Netw. Learn. Syst 30(2):601–614. https://doi.org/10.1109/TNNLS.2018.2846646

    Article  Google Scholar 

  45. Wang J, Kumbasar T (2019) Parameter optimization of interval Type-2 fuzzy neural networks based on PSO and BBBC methods. IEEE/CAA J Autom Sinica 6(1):247–257

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Key Program) (52131202) and the Natural Science Foundation of Jilin Province (20190201107JC). The authors would like to thank the Digital Roadway Interactive Visualization and Evaluation Network (DRIVENet) for providing the traffic volume data used to validate this methodology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiaowen Bai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Cao, Q., Bai, Q. et al. Multistate time series imputation using generative adversarial network with applications to traffic data. Neural Comput & Applic 35, 6545–6567 (2023). https://doi.org/10.1007/s00521-022-07961-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07961-4

Keywords

Navigation