Abstract
Data missing is very common in the spatial–temporal traffic data collected by various detectors, and how to accurately impute the missing values is particularly important in intelligent transportation systems. Because the method based on tensor decomposition has advantages in solving the problem of multidimensional data imputation, in this paper, we regard the missing traffic speed data imputation as a tensor decomposition problem and propose a three-process framework based on the tensor decomposition of spatial–temporal regularization, which imputes the missing traffic speed data by using the hidden spatial–temporal characteristics and underlying structure. Specifically, we first propose a high-precision initialization method based on the low-rank tensor completion model. The experimental results show that the optimal initialization of tensor decomposition has good imputation performance. Then, we design a threshold and flexibly choose the truncation rank in the truncated higher-order singular value decomposition, to get the core tensor of appropriate size and better capture the characteristics of each dimension. Finally, we apply these features and add regularization term constraints related to the time interval of one day and the location of road detectors, and the missing traffic speed data are estimated by spatial–temporal regularized Tucker decomposition (STRTD). In addition to the scenes of element-like random missing (EM) and fiber-like random missing (FM), our experiment also creates a region-like random missing (RM) by imitating the real-world loss. We have done experiments on real-world traffic speed data sets, and the results show that our STRTD model is better than the most advanced imputation model at present, even in the case of a high missing rate.
Similar content being viewed by others
References
Asif, M.T., Kannan, S., Dauwels, J., Jaillet, P.: Data compression techniques for urban traffic data. In: 2013 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS), pp. 44–49. IEEE, Singapore (2013). https://doi.org/10.1109/CIVTS.2013.6612288
Li, L., Li, Y., Li, Z.: Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transp. Res. Part C Emerging Technol. 34, 108–120 (2013)
Gharehchopogh, F.S., Shayanfar, H.: Automatic data clustering using farmland fertility metaheuristic algorithm. In: Advances in Swarm Intelligence: Variations and Adaptations for Optimization Problems, pp. 199–215. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09835-2_11
Piri, J., Mohapatra, P., Acharya, B., Gharehchopogh, F.S., Gerogiannis, V.C., Kanavos, A., Manika, S.: Feature selection using artificial gorilla troop optimization for biomedical data: A case analysis with covid-19 data. Mathematics 10(15), 2742 (2022). https://doi.org/10.3390/math10152742
Sorkhabi, L.B., Gharehchopogh, F.S., Shahamfar, J.: A systematic approach for pre-processing electronic health records for mining: case study of heart disease. Int. J. Data Min. Bioinform. 24(2), 97–120 (2020). https://doi.org/10.1504/IJDMB.2020.110154
Rahnema, N., Gharehchopogh, F.S.: An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimedia Tools Appl. 79(43–44), 32169–32194 (2020). https://doi.org/10.1007/s11042-020-09639-2
Gharehchopogh, F.S., Ucan, A., Ibrikci, T., Arasteh, B., Isik, G.: Slime mould algorithm: a comprehensive survey of its variants and applications. Arch. Comput. Methods Eng. (2023). https://doi.org/10.1007/s11831-023-09883-3
Qu, H., Gong, Y., Chen, M., Zhang, J., Zheng, Y., Yin, Y.: Forecasting fine-grained urban flows via spatio-temporal contrastive self-supervision. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3200734
Wang, Y., Zheng, Y., Xue, Y.: Travel time estimation of a path using sparse trajectories. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 25–34. ACM, New York (2014). https://doi.org/10.1145/2623330.2623656
Gong, Y., Li, Z., Zhang, J., Liu, W., Yin, Y., Zheng, Y.: Missing value imputation for multi-view urban statistical data via spatial correlation learning. IEEE Trans. Knowl. Data Eng. 35(1), 686–698 (2021). https://doi.org/10.1109/TKDE.2021.3072642
Gong, Y., Li, Z., Zhang, J., Liu, W., Chen, B., Dong, X.: A spatial missing value imputation method for multi-view urban statistical data. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 1310–1316. IJCAI’20, Yokohama, Japan (2021)
Tan, H., Feng, G., Feng, J., Wang, W., Zhang, Y.-J., Li, F.: A tensor-based method for missing traffic data completion. Transp. Res. Part C Emerging Technol. 28, 15–27 (2013)
Ran, B., Tan, H., Wu, Y., Jin, P.J.: Tensor based missing traffic data completion with spatial-temporal correlation. Physica A 446, 54–63 (2016)
Asif, M.T., Mitrovic, N., Dauwels, J., Jaillet, P.: Matrix and tensor based methods for missing data estimation in large traffic networks. IEEE Trans. Intell. Transp. Syst. 17(7), 1816–1825 (2016). https://doi.org/10.1109/TITS.2015.2507259
Gong, Y., Li, Z., Zhang, J., Liu, W., Yi, J.: Potential passenger flow prediction: a novel study for urban transportation development. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4020–4027. AAAI, California, USA (2020). https://doi.org/10.1609/aaai.v34i04.5819
Tan, H., Wu, Y., Shen, B., Jin, P.J., Ran, B.: Short-term traffic prediction based on dynamic tensor completion. IEEE Trans. Intell. Transp. Syst. 17(8), 2123–2133 (2016). https://doi.org/10.1109/TITS.2015.2513411
Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factorizations for incomplete data. IEEE Trans. Intell. Transp. Syst. 106(1), 41–56 (2011)
Chen, J., Shao, J.: Nearest neighbor imputation for survey data. J. Off. Stat. 16(2), 113 (2000)
Smith, B.L., Scherer, W.T., Conklin, J.H.: Exploring imputation techniques for missing data in transportation management systems. Transp. Res. Rec. 1836(1), 132–142 (2003)
Smith, B.L., Conklin, J.H.: Use of local lane distribution patterns to estimate missing data values from traffic monitoring systems. Transp. Res. Rec. 1811(1), 50–56 (2002)
Gold, D.L., Turner, S.M., Gajewski, B.J., Spiegelman, C.: Imputing missing values in its data archives for intervals under 5 minutes. In: Transportation Research Board 80th Annual Meeting. ARRB, Washington, D.C., US (2001)
Qu, L., Zhang, Y., Hu, J., Jia, L., Li, L.: A bpca based missing value imputing method for traffic flow volume data. In: 2008 IEEE Intelligent Vehicles Symposium, pp. 985–990. IEEE, Eindhoven, Netherlands (2008). https://doi.org/10.1109/IVS.2008.4621153
Qu, L., Li, L., Zhang, Y., Hu, J.: PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Trans. Intell. Transp. Syst. 10(3), 512–522 (2009)
Tan, H., Wu, Y., Cheng, B., Wang, W., Ran, B.: Robust missing traffic flow imputation considering nonnegativity and road capacity. Math. Probl. Eng. 2014, 1–8 (2014)
Guo, Y., Wang, X., Wang, M., Zhang, H.: An improved low rank matrix completion method for traffic data. In: 2018 11th International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 255–260. IEEE, Changsha, China (2018). https://doi.org/10.1109/ICICTA.2018.00064
Silva-Ramírez, E.-L., Pino-Mejías, R., López-Coello, M., Cubiles-de-la-Vega, M.-D.: Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 24(1), 121–129 (2011)
Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012). https://doi.org/10.1109/TPAMI.2012.39
Ran, B., Tan, H., Feng, J., Wang, W., Cheng, Y., Jin, P.: Estimating missing traffic volume using low multilinear rank tensor completion. J. Intell. Transp. Syst. 20(2), 152–161 (2016). https://doi.org/10.1080/15472450.2015.1015721
Goulart, J.d.M., Kibangou, A., Favier, G.: Traffic data imputation via tensor completion based on soft thresholding of tucker core. Transp. Res. Part C Emerging Technol. 85, 348–362 (2017)
Chen, X., Lei, M., Saunier, N., Sun, L.: Low-rank autoregressive tensor completion for spatiotemporal traffic data imputation. IEEE Trans. Intell. Transp. Syst. 23(8), 12301–12310 (2021). https://doi.org/10.1109/TITS.2021.3113608
Wang, X., Wu, Y., Zhuang, D., Sun, L.: Low-rank Hankel tensor completion for traffic speed estimation. arXiv e-prints, 2105-11335 (2021) arXiv:2105.11335 [cs.LG]
Zhao, Q., Zhang, L., Cichocki, A.: Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1751–1763 (2015). https://doi.org/10.1109/TPAMI.2015.2392756
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Schifanella, C., Candan, K.S., Sapino, M.L.: Multiresolution tensor decompositions with mode hierarchies. ACM Trans. Knowl. Discov. Data (TKDD) 8(2), 1–38 (2014)
Carroll, J.D., Chang, J.-J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young’’ decomposition. Psychometrika 35(3), 283–319 (1970)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). https://doi.org/10.1137/07070111X
Wu, Y., Tan, H., Li, Y., Li, F., He, H.: Robust tensor decomposition based on Cauchy distribution and its applications. Neurocomputing 223, 107–117 (2017). https://doi.org/10.1016/j.neucom.2016.10.030
Chen, X., He, Z., Wang, J.: Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transp. Res. Part C Emerg. Technol. 86, 59–77 (2018). https://doi.org/10.1016/j.neunet.2010.09.008
Chen, X., He, Z., Sun, L.: A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transp. Res. Part C Emerg. Technol. 98, 73–84 (2019). https://doi.org/10.1016/j.trc.2018.11.003
Duan, Y., Lv, Y., Liu, Y.-L., Wang, F.-Y.: An efficient realization of deep learning for traffic data imputation. Transp. Res. Part C Emerg. Technol. 72, 168–181 (2016). https://doi.org/10.1016/j.trc.2016.09.015
Zhang, Z., Lin, X., Li, M., Wang, Y.: A customized deep learning approach to integrate network-scale online traffic data imputation and prediction. Transp. Res. Part C Emerg. Technol. 132, 103372 (2021). https://doi.org/10.1016/j.trc.2021.103372
Han, Y., Moutarde, F.: Analysis of large-scale traffic dynamics in an urban transportation network using non-negative tensor factorization. Int. J. Intell. Transp. Syst. Res. 14(1), 36–49 (2016). https://doi.org/10.1007/s13177-014-0099-7
Li, X., Li, M., Gong, Y.-J., Zhang, X.-L., Yin, J.: T-DesP: destination prediction based on big trajectory data. IEEE Trans. Intell. Transp. Syst. 17(8), 2344–2354 (2016). https://doi.org/10.1109/TITS.2016.2518685
Asif, M.T., Srinivasan, K., Mitrovic, N., Dauwels, J., Jaillet, P.: Near-lossless compression for large traffic networks. IEEE Trans. Intell. Transp. Syst. 16(4), 1817–1826 (2014). https://doi.org/10.1109/TITS.2014.2374335
Sun, L., Axhausen, K.W.: Understanding urban mobility patterns with a probabilistic tensor factorization framework. Transp. Res. Part B Methodol. 91, 511–524 (2016). https://doi.org/10.1016/j.trb.2016.06.011
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000). https://doi.org/10.1137/S0895479896305696
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (2013)
Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012). https://doi.org/10.1109/TPAMI.2012.39
Chen, X., Yang, J., Sun, L.: A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation. Transp. Res. Part C Emerg. Technol. 117, 102673 (2020). https://doi.org/10.1016/j.trc.2020.102673
Hu, Y., Zhang, D., Ye, J., Li, X., He, X.: Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117–2130 (2012). https://doi.org/10.1109/TPAMI.2012.271
Chen, B., Li, Z., Zhang, S.: On optimal low rank tucker approximation for tensors: the case for an adjustable core size. J. Glob. Optim. 62(4), 811–832 (2015). https://doi.org/10.1007/s10898-014-0231-x
Deng, D., Shahabi, C., Demiryurek, U., Zhu, L., Yu, R., Liu, Y.: Latent space model for road networks to predict time-varying traffic. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1525–1534. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939860
Lambiotte, R., Delvenne, J.-C., Barahona, M.: Laplacian dynamics and multiscale modular structure in networks. arXiv e-prints, 0812-1770 (2008). arXiv:0812.1770 [physics.soc-ph]
Gong, Y., Li, Z., Zhang, J., Liu, W., Zheng, Y.: Online spatio-temporal crowd flow distribution prediction for complex metro system. IEEE Trans. Knowl. Data Eng. 34(2), 865–880 (2020). https://doi.org/10.1109/TKDE.2020.2985952
Chen, X., He, Z., Chen, Y., Lu, Y., Wang, J.: Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transp. Res. Part C Emerg. Technol. 104, 66–77 (2019). https://doi.org/10.1016/j.trc.2019.03.003
Nie, X., Peng, J., Wu, Y., Gupta, B.B., El-Latif, A.A.A.: Real-time traffic speed estimation for smart cities with spatial temporal data: a gated graph attention network approach. Big Data Res. 28, 100313 (2022). https://doi.org/10.1016/j.bdr.2022.100313
Liu, J., Ong, G.P., Chen, X.: Graphsage-based traffic speed forecasting for segment network with sparse data. IEEE Trans. Intell. Transp. Syst. 23(3), 1755–1766 (2022). https://doi.org/10.1109/TITS.2020.3026025
Meng, X., Fu, H., Peng, L., Liu, G., Yu, Y., Wang, Z., Chen, E.: D-LSTM: Short-term road traffic speed prediction model based on GPS positioning data. IEEE Trans. Intell. Transp. Syst. 23(3), 2021–2030 (2022). https://doi.org/10.1109/TITS.2020.3030546
Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, vol. 24 (2011)
Funding
This paper was supported by the National Natural Science Foundation of China (62076143, 62202270), in part by the Shandong Excellent Young Scientists Fund (Oversea) (2022HWYQ-044), and in part by the Fundamental research promotion plan of Qilu University of Technology (Shandong Academy of Sciences) (No. 2021JC02009).
Author information
Authors and Affiliations
Contributions
XD conceived of the presented idea. HX wrote the main manuscript text and performed the experiments. HX completed all the charts and tables. YG further improved the experiment and revised the manuscript. XD made a comprehensive revision of the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
On behalf of all authors, the corresponding author declares that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, H., Gong, Y. & Dong, X. Spatial–temporal regularized tensor decomposition method for traffic speed data imputation. Int J Data Sci Anal 17, 203–223 (2024). https://doi.org/10.1007/s41060-023-00412-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-023-00412-w