ABSTRACT
Time series has wide applications in the real world and is known to be difficult to forecast. Since its statistical properties change over time, its distribution also changes temporally, which will cause severe distribution shift problem to existing methods. However, it remains unexplored to model the time series in the distribution perspective. In this paper, we term this as Temporal Covariate Shift (TCS). This paper proposes Adaptive RNNs (AdaRNN) to tackle the TCS problem by building an adaptive model that generalizes well on the unseen test data. AdaRNN is sequentially composed of two novel algorithms. First, we propose Temporal Distribution Characterization to better characterize the distribution information in the TS. Second, we propose Temporal Distribution Matching to reduce the distribution mismatch in TS to learn the adaptive TS model. AdaRNN is a general framework with flexible distribution distances integrated. Experiments on human activity recognition, air quality prediction, and financial analysis show that AdaRNN outperforms the latest methods by a classification accuracy of 2.6% and significantly reduces the RMSE by 9.0%. We also show that the temporal distribution matching algorithm can be extended in Transformer structure to boost its performance.
Supplemental Material
- Bandar Almaslukh, Jalal AlMuhtadi, and Abdelmonim Artoli. 2017. An effective deep autoencoder approach for online smartphone-based human activity recognition. Int. J. Comput. Sci. Netw. Secur, Vol. 17, 4 (2017), 160--165.Google Scholar
- Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. Metareg: Towards domain generalization using meta-regularization. In NeurIPS. 998--1008. Google ScholarDigital Library
- Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, Vol. 22, 14 (2006), e49--e57. Google ScholarDigital Library
- Lingzhen Chen and Alessandro Moschitti. 2019. Transfer learning for sequence labeling using source model and target data. In AAAI, Vol. 33. 6260--6267.Google ScholarCross Ref
- Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS. 3504--3512. Google ScholarDigital Library
- Fu-Lai Chung, Tak-Chung Fu, Vincent Ng, and Robert WP Luk. 2004. An evolutionary approach to pattern-based time series segmentation. IEEE transactions on evolutionary computation, Vol. 8, 5 (2004), 471--489. Google ScholarDigital Library
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarDigital Library
- Wanyun Cui, Guangyu Zheng, Zhiqiang Shen, S. Jiang, and Wei Wang. 2019. Transfer Learning for Sequences via Learning to Collocate. In ICLR.Google Scholar
- Emmanuel de Bézenac, Syama Sundar Rangapuram, Konstantinos Benidis, Michael Bohlke-Schneider, Richard Kurle, Lorenzo Stella, Hilaf Hasson, Patrick Gallinari, and Tim Januschowski. 2020. Normalizing Kalman Filters for Multivariate Time Series Analysis. NeurIPS, Vol. 33 (2020).Google Scholar
- Facebook. Accessed in Jan 2021. FBProphet. https://facebook.github.io/prophet/.Google Scholar
- Hassan Ismail Fawaz, G. Forestier, Jonathan Weber, L. Idoumghar, and Pierre-Alain Muller. 2018. Transfer learning for time series classification. Big Data (2018), 1367--1376.Google Scholar
- Yaroslav Ganin, E. Ustinova, Hana Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. JMLR, Vol. 17 (2016), 59:1--59:35. Google ScholarDigital Library
- Tomasz Górecki and Maciej Luczak. 2015. Multivariate time series classification with parametric derivative dynamic time warping. Expert Syst. Appl., Vol. 42 (2015), 2305--2312. Google ScholarDigital Library
- A. Gretton, Bharath K. Sriperumbudur, D. Sejdinovic, Heiko Strathmann, S. Balakrishnan, M. Pontil, and K. Fukumizu. 2012. Optimal kernel choice for large-scale two-sample tests. In NeurIPS. Google ScholarDigital Library
- Priyanka Gupta, P. Malhotra, L. Vig, and G. Shroff. 2018. Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks. In ML for Medicine and Healthcare Workshop at KDD.Google Scholar
- David Hallac, Sagar Vare, Stephen Boyd, and Jure Leskovec. 2017. Toeplitz inverse covariance-based clustering of multivariate time series data. In SIGKDD. 215--223. Google ScholarDigital Library
- Edwin T Jaynes. 1982. On the rationale of maximum-entropy methods. Proc. IEEE, Vol. 70, 9 (1982), 939--952.Google ScholarCross Ref
- Kaggle. Accessed in Jan 2021. Household electric power consumption dataset. https://www.kaggle.com/uciml/electric-power-consumption-data-set.Google Scholar
- Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta. 2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. IEEE, 273--280. Google ScholarDigital Library
- Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowl. Inf. Syst., Vol. 7, 3 (2005), 358--386. Google ScholarDigital Library
- Kazuhiro Kohara, Tsutomu Ishikawa, Yoshimi Fukuhara, and Yukihiro Nakamura. 1997. Stock price prediction using prior knowledge and neural networks. Intell. Syst. Account. Finance Manag., Vol. 6, 1 (1997), 11--22.Google ScholarCross Ref
- Vitaly Kuznetsov and Mehryar Mohri. 2014. Generalization bounds for time series prediction with non-stationary processes. In ALT. Springer, 260--274.Google Scholar
- Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. 2018. Modeling long-and short-term temporal patterns with deep neural networks. In SIGIR. 95--104. Google ScholarDigital Library
- Vincent Le Guen and Nicolas Thome. 2020. Probabilistic Time Series Forecasting with Structured Shape and Temporal Diversity. In Advances in Neural Information Processing Systems.Google Scholar
- Qi Lei, Jinfeng Yi, Roman Vaculín, Lingfei Wu, and Inderjit S. Dhillon. 2019. Similarity Preserving Representation Learning for Time Series Clustering. In IJCAI. Google ScholarDigital Library
- Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C Kot. 2018. Domain generalization with adversarial feature learning. In CVPR. 5400--5409.Google Scholar
- J. Lines and Anthony J. Bagnall. 2014. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov., Vol. 29 (2014), 565--592. Google ScholarDigital Library
- Xiaoyan Liu, Zhenjiang Lin, and Huaiqing Wang. 2008. Novel online methods for time series segmentation. IEEE TKDE, Vol. 20, 12 (2008), 1616--1626. Google ScholarDigital Library
- Yasuko Matsubara, Yasushi Sakurai, Willem G Van Panhuis, and Christos Faloutsos. 2014. FUNNEL: automatic mining of spatially coevolving epidemics. In KDD. 105--114. Google ScholarDigital Library
- Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. 2013. Domain generalization via invariant feature representation. In ICML. 10--18. Google ScholarDigital Library
- Boris N Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. 2021. Meta-learning framework with applications to zero-shot time-series forecasting. In AAAI.Google Scholar
- C. Orsenigo and C. Vercellis. 2010. Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit., Vol. 43 (2010), 3787--3794. Google ScholarDigital Library
- Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. In AAAI. Google ScholarDigital Library
- Goce Ristanoski, Wei Liu, and James Bailey. 2013. Time series forecasting using distribution enhanced linear regression. In PAKDD. 484--495.Google Scholar
- Joshua W Robinson, Alexander J Hartemink, and Zoubin Ghahramani. 2010. Learning Non-Stationary Dynamic Bayesian Networks. Journal of Machine Learning Research, Vol. 11, 12 (2010). Google ScholarDigital Library
- Sheldon M Ross. 2014. Introduction to stochastic dynamic programming. Academic press.Google Scholar
- David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast, Vol. 36, 3 (2020), 1181--1191.Google ScholarCross Ref
- P. Schäfer. 2015. Scalable time series classification. Data Min. Knowl. Discov., Vol. 30 (2015), 1273--1298. Google ScholarDigital Library
- Robert E Schapire. 2003. The boosting approach to machine learning: An overview. Nonlinear estimation and classification (2003), 149--171.Google Scholar
- Rajat Sen, Hsiang-Fu Yu, and Inderjit S Dhillon. 2019. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. In NeurIPS. 4837--4846. Google ScholarDigital Library
- Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, Vol. 90, 2 (2000), 227--244.Google ScholarCross Ref
- Baochen Sun and Kate Saenko. 2016. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In ECCV.Google Scholar
- Kerem Sinan Tuncel and Mustafa Gokce Baydogan. 2018. Autoregressive forests for multivariate time series modeling. Pattern recognition, Vol. 73 (2018), 202--215.Google Scholar
- E. Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial Discriminative Domain Adaptation. In CVPR. 2962--2971.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008. Google ScholarDigital Library
- LE Vincent and Nicolas Thome. 2019. Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models. In NeurIPS. 4189--4201. Google ScholarDigital Library
- Jindong Wang, Yiqiang Chen, Wenjie Feng, Han Yu, Meiyu Huang, and Qiang Yang. 2020. Transfer learning with dynamic distribution adaptation. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 11, 1 (2020), 1--25. Google ScholarDigital Library
- Jindong Wang, Wenjie Feng, Yiqiang Chen, Han Yu, Meiyu Huang, and Philip S. Yu. 2018. Visual domain adaptation with manifold embedded distribution alignment. In Proceedings of the 26th ACM international conference on Multimedia. 402--410. Google ScholarDigital Library
- Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Wenjun Zeng, and Tao Qin. 2021. Generalizing to Unseen Domains: A Survey on Domain Generalization. In International Joint Conference on Artificial Intelligence (IJCAI).Google ScholarCross Ref
- Zhiguang Wang and Tim Oates. 2015. Imaging Time-Series to Improve Classification and Imputation. In IJCAI. Google ScholarDigital Library
- Z. Yang, R. Salakhutdinov, and William W. Cohen. 2017. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks. In ICLR.Google ScholarDigital Library
- Chaohui Yu, Jindong Wang, Yiqiang Chen, and Meiyu Huang. 2019. Transfer learning with dynamic adversarial adaptation network. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 778--786.Google ScholarCross Ref
- G Peter Zhang. 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, Vol. 50 (2003), 159--175.Google ScholarCross Ref
- Shuyi Zhang, Bin Guo, Anlan Dong, Jing He, Ziping Xu, and Song Xi Chen. 2017. Cautionary tales on air-quality improvement in Beijing. Proc. Math. Phys. Eng. Sci., Vol. 473, 2205 (2017), 20170457.Google Scholar
- Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. Elsevier, 358--369. Google ScholarDigital Library
- Yongchun Zhu, Fuzhen Zhuang, Jindong Wang, Guolin Ke, Jingwu Chen, Jiang Bian, Hui Xiong, and Qing He. 2020. Deep subdomain adaptation network for image classification. IEEE transactions on neural networks and learning systems, Vol. 32, 4 (2020), 1713--1722.Google ScholarCross Ref
Index Terms
- AdaRNN: Adaptive Learning and Forecasting of Time Series
Recommendations
Generalized autoregressive moving average modeling of the Bellcore data
LCN '00: Proceedings of the 25th Annual IEEE Conference on Local Computer NetworksGeneralized autoregressive moving average (GARMA) models are fitted to the Leland et al. (1994) Bellcore Ethernet trace data. We find the time series to have long memory. In addition, we find evidence for self-similarity, as was also found in earlier ...
A fuzzy seasonal ARIMA model for forecasting
Information processingThis paper proposes a fuzzy seasonal ARIMA (FSARIMA) forecasting model, which combines the advantages of the seasonal time series ARIMA (SARIMA) model and the fuzzy regression model. It is used to forecast two seasonal time series data of the total ...
Hybrid Model Using Interacted-ARIMA and ANN Models for Efficient Forecasting
Multi-disciplinary Trends in Artificial IntelligenceAbstractWhen two models applied to the same dataset produce two different sets of forecasts, it is a good practice to combine the forecasts rather than using the better one and discarding the other. Alternatively, the models can also be combined to have a ...
Comments