skip to main content
10.1145/3580305.3599257acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections

An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series

Published: 04 August 2023 Publication History


Missing values, which are common in multivariate time series, is most important obstacle towards the utilization and interpretation of those data. Great efforts have been employed on how to accurately impute missing values in multivariate time series, and existing works either use deep learning networks to achieve deterministic imputations or aim at generating different plausible imputations by sampling multiple noises from a same distribution and then denoising them. However, these models either fall short of modeling the uncertainties of imputations due to their deterministic nature or perform poorly in terms of interpretability and imputation accuracy due to their ignorance of the correlations between the latent representations of both observed and missing values which are parts of samples from a same distribution. To this end, in this paper, we explicitly take the correlations between observed and missing values into account, and theoretically re-derive the Evidence Lower BOund (ELBO) of conditional diffusion model in the scenario of multivariate time series imputation. Based on the newly derived ELBO, we further propose a novel multivariate imputation diffusion model (MIDM) which is equipped with novel noise sampling, adding and denoising mechanisms for multivariate time series imputation, and the series of newly designed technologies jointly ensure the involving of the consistency between observed and missing values. Extensive experiments on both the tasks of multivariate time series imputation and forecasting witness the superiority of our proposed MIDM model on generating conditional estimations.

Supplementary Material

MP4 File (rtfp1072-2min-promo.mp4)
Missing values, which are common in multivariate time series, is most important obstacle towards the utilization and interpretation of those data. In this paper, we explicitly take the correlations between observed and missing values into account, and theoretically re-derive the Evidence Lower BOund (ELBO) of conditional diffusion model in the scenario of multivariate time series imputation. Based on the newly derived ELBO, we further propose a novel multivariate imputation diffusion model (MIDM) which is equipped with novel noise sampling, adding and denoising mechanisms for multivariate time series imputation, and the series of newly designed technologies jointly ensure the involving of the consistency between observed and missing values.


Juan Miguel Lopez Alcaraz and Nils Strodthoff. 2022. Diffusion-based time series imputation and forecasting with structured state space models. arXiv preprint arXiv:2208.09399 (2022).
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in Neural Information Processing Systems, Vol. 33 (2020), 17804--17815.
Robert Bamler and Stephan Mandt. 2017. Structured black box variational inference for latent time series models. arXiv preprint arXiv:1707.01069 (2017).
Gregory Benton, Wesley Maddox, and Andrew Gordon Wilson. 2022. Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes. In International Conference on Machine Learning. PMLR, 1798--1816.
Mathias Berglund, Tapani Raiko, Mikko Honkala, Leo Kärkkäinen, Akos Vetek, and Juha T Karhunen. 2015. Bidirectional recurrent neural networks as generative models. Advances in neural information processing systems, Vol. 28 (2015).
George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time series analysis: forecasting and control. John Wiley & Sons.
Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. 2018. Brits: Bidirectional recurrent imputation for time series. Advances in neural information processing systems, Vol. 31 (2018).
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2018. Recurrent neural networks for multivariate time series with missing values. Scientific reports, Vol. 8, 1 (2018), 6085.
Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. 2001. Freeway Performance Measurement System: Mining Loop Detector Data. Transportation Research Record, Vol. 1748, 1 (2001), 96--102.
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. 2020. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713 (2020).
Tae-Min Choi, Ji-Su Kang, and Jong-Hwan Kim. 2020. RDIS: Random drop imputation with self-training for incomplete time series data. arXiv preprint arXiv:2010.10075 (2020).
Andrea Cini, Ivan Marisca, and Cesare Alippi. 2021. Filling the g_ap_s: Multivariate time series imputation by graph neural networks. arXiv preprint arXiv:2108.00298 (2021).
Adrian V Dalca, John Guttag, and Mert R Sabuncu. 2019. Unsupervised data imputation via variational inference of deep subspaces. arXiv preprint arXiv:1903.03503 (2019).
Vincent Fortuin, Dmitry Baranchuk, Gunnar Rätsch, and Stephan Mandt. 2020. Gp-vae: Deep probabilistic time series imputation. In International conference on artificial intelligence and statistics. PMLR, 1651--1661.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv preprint arXiv:1406.2661 (2014).
Albert Gu, Karan Goel, and Christopher Ré. 2022. Efficiently modeling long sequences with structured state spaces. International Conference on Learning Representations (2022).
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.
James Honaker and Gary King. 2010. What to do about missing values in time-series cross-section data. American journal of political science, Vol. 54, 2 (2010), 561--581.
Andrew T Hudak, Nicholas L Crookston, Jeffrey S Evans, David E Hall, and Michael J Falkowski. 2008. Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data. Remote Sensing of Environment, Vol. 112, 5 (2008), 2232--2245.
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2020. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020).
Yan Li, Xinjiang Lu, Yaqing Wang, and Dejing Dou. 2023. Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement. arXiv preprint arXiv:2301.03028 (2023).
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, et al. 2018. Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems, Vol. 31 (2018).
Yonghong Luo, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan. 2019. E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence. AAAI Press, 3094--3100.
Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, and Jianwei Yin. 2021. Generative semi-supervised learning for multivariate time series imputation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 8983--8991.
Morten Morup, Daniel M Dunlavy, Evrim Acar, and Tamara Gibson Kolda. 2010. Scalable tensor factorizations with missing data. Technical Report. Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA ?.
Fulufhelo V Nelwamondo, Shakir Mohamed, and Tshilidzi Marwala. 2007. Missing data: A comparison of neural network and expectation maximization techniques. Current Science (2007), 1514--1521.
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. 2021. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning. PMLR, 8857--8868.
Stephen Roberts, Michael Osborne, Mark Ebden, Steven Reece, Neale Gibson, and Suzanne Aigrain. 2013. Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 371, 1984 (2013), 20110550.
Satya Narayan Shukla and Benjamin M Marlin. 2021. Multi-time attention networks for irregularly sampled time series. arXiv preprint arXiv:2101.10318 (2021).
Ikaro Silva, George Moody, Daniel J Scott, Leo A Celi, and Roger G Mark. 2012. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In 2012 Computing in Cardiology. IEEE, 245--248.
Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, Vol. 32 (2019).
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. International Conference on Learning Representations.
Qiuling Suo, Weida Zhong, Guangxu Xun, Jianhui Sun, Changyou Chen, and Aidong Zhang. 2020. GLIMA: Global and local time series imputation with multi-directional attention learning. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 798--807.
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. 2021. CSDI: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 24804--24816.
Stef Van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate imputation by chained equations in R. Journal of statistical software, Vol. 45 (2011), 1--67.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
Pengkun Wang, Chaochao Zhu, Xu Wang, Zhengyang Zhou, Guang Wang, and Yang Wang. 2022. Inferring intersection traffic patterns with sparse video surveillance information: An st-gan method. IEEE Transactions on Vehicular Technology, Vol. 71, 9 (2022), 9840--9852.
Ian R White, Patrick Royston, and Angela M Wood. 2011. Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine, Vol. 30, 4 (2011), 377--399.
Xiuwen Yi, Yu Zheng, Junbo Zhang, and Tianrui Li. 2016. ST-MVL: filling missing values in geo-sensory time series data. In Proceedings of the 25th International Joint Conference on Artificial Intelligence.
Boseon Yoo, Jiwoo Lee, Janghoon Ju, Seijun Chung, Soyeon Kim, and Jaesik Choi. 2021. Conditional Temporal Neural Processes with Covariance Loss. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 12051--12061.
Jinsung Yoon, James Jordon, and Mihaela Schaar. 2018a. Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning. PMLR, 5689--5698.
Jinsung Yoon, William R Zame, and Mihaela van der Schaar. 2018b. Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Transactions on Biomedical Engineering, Vol. 66, 5 (2018), 1477--1490.
Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. 2020. GMAN: A Graph Multi-Attention Network for Traffic Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 01 (April 2020), 1234--1241. Number: 01.
Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. 2015. Forecasting Fine-Grained Air Quality Based on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD '15). Association for Computing Machinery, New York, NY, USA, 2267--2276.
Eric Zivot and Jiahui Wang. 2006. Vector autoregressive models for multivariate time series. Modeling financial time series with S-PLUS® (2006), 385--429.

Cited By

View all
  • (2025)Conditional diffusion model for recommender systemsNeural Networks10.1016/j.neunet.2025.107204185(107204)Online publication date: May-2025
  • (2025)Boundary-enhanced time series data imputation with long-term dependency diffusion modelsKnowledge-Based Systems10.1016/j.knosys.2024.112917310(112917)Online publication date: Feb-2025
  • (2025)Heterogeneous multivariate time series imputation by transformer model with missing position encodingExpert Systems with Applications10.1016/j.eswa.2025.126435(126435)Online publication date: Jan-2025
  • Show More Cited By

Index Terms

  1. An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series



      Information & Contributors


      Published In

      cover image ACM Conferences
      KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2023
      5996 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 August 2023


      Request permissions for this article.

      Check for updates

      Author Tags

      1. conditional generation
      2. diffusion model
      3. multivariate time series


      • Research-article

      Funding Sources


      KDD '23

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)880
      • Downloads (Last 6 weeks)67
      Reflects downloads up to 02 Mar 2025

      Other Metrics


      Cited By

      View all
      • (2025)Conditional diffusion model for recommender systemsNeural Networks10.1016/j.neunet.2025.107204185(107204)Online publication date: May-2025
      • (2025)Boundary-enhanced time series data imputation with long-term dependency diffusion modelsKnowledge-Based Systems10.1016/j.knosys.2024.112917310(112917)Online publication date: Feb-2025
      • (2025)Heterogeneous multivariate time series imputation by transformer model with missing position encodingExpert Systems with Applications10.1016/j.eswa.2025.126435(126435)Online publication date: Jan-2025
      • (2025)Acceleration-Guided Diffusion Model for Multivariate Time Series ImputationDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_8(115-130)Online publication date: 11-Jan-2025
      • (2024)LeRetProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/460(4165-4173)Online publication date: 3-Aug-2024
      • (2024)MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent ImputationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679532(3474-3483)Online publication date: 21-Oct-2024
      • (2024)STIOS: A Novel Self-supervised Diffusion Model for Trajectory Imputation in Open Environment Scenarios2024 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics62450.2024.00104(559-566)Online publication date: 19-Aug-2024
      • (2024)A Comprehensive Survey on Traffic Missing Data ImputationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.347881625:12(19252-19275)Online publication date: Dec-2024
      • (2024)Adaptive and Interactive Multi-Level Spatio-Temporal Network for Traffic ForecastingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339297525:10(14070-14086)Online publication date: Oct-2024
      • (2024)Denoising Diffusion Straightforward Models for Energy Conversion Monitoring Data ImputationIEEE Transactions on Industrial Informatics10.1109/TII.2024.341334920:10(11987-11997)Online publication date: Oct-2024
      • Show More Cited By

      View Options

      Login options

      View options


      View or Download as a PDF file.



      View online with eReader.







      Share this Publication link

      Share on social media