ABSTRACT
Given a collection of seasonal time-series, how can we find regular (cyclic) patterns and outliers (i.e. rare events)? These two types of patterns are hidden and mixed in the time-varying activities. How can we robustly separate regular patterns and outliers, without requiring any prior information?
We present CycloneM, a unifying model to capture both cyclic patterns and outliers, and CycloneFact, a novel algorithm which solves the above problem. We also present an automatic mining framework AutoCyclone, based on CycloneM and CycloneFact. Our method has the following properties; (a) effective: it captures important cyclic features such as trend and seasonality, and distinguishes regular patterns and rare events clearly; (b) robust and accurate: it detects the above features and patterns accurately against outliers; (c) fast: CycloneFact takes linear time in the data size and typically converges in a few iterations; (d) parameter free: our modeling framework frees the user from having to provide parameter values.
Extensive experiments on 4 real datasets demonstrate the benefits of the proposed model and algorithm, in that the model can capture latent cyclic patterns, trends and rare events, and the algorithm outperforms the existing state-of-the-art approaches. CycloneFact was up to 5 times more accurate and 20 times faster than top competitors.
- Monthly electricity statistics, international enegy agency. http://www.iea.org/statistics/monthlystatistics/monthlyelectricitystatistics/.Google Scholar
- Tropical atomosphere ocean project. http://www.pmel.noaa.gov/tao/data_deliv/deliv.html.Google Scholar
- R. Bro and H. A. Kiers. A new efficient method for determining the number of components in parafac models. Journal of chemometrics, 17(5):274--286, 2003.Google ScholarCross Ref
- W. Cheng, K. Zhang, H. Chen, G. Jiang, and W. Wang. Ranking causal anomalies via temporal and dynamical analysis on vanishing correlations. In KDD, 2016. Google ScholarDigital Library
- J. C. Ho, J. Ghosh, and J. Sun. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In KDD, pages 115--124, 2014. Google ScholarDigital Library
- T. Idé, A. C. Lozano, N. Abe, and Y. Liu. Proximity-based anomaly detection using sparse structure learning. In SDM, pages 97--108, 2009.Google Scholar
- R. Jiang, H. Fei, and J. Huan. Anomaly localization for network data streams with graph joint sparse pca. In KDD, pages 886--894, 2011. Google ScholarDigital Library
- U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In KDD, pages 316--324, 2012. Google ScholarDigital Library
- E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In ICDM, pages 289--296, 2001. Google ScholarDigital Library
- T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455--500, 2009. Google ScholarDigital Library
- L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1--2):385--396, 2010. Google ScholarDigital Library
- Y.-R. Lin, J. Sun, H. Sundaram, A. Kelliher, P. Castro, and R. Konuru. Community discovery via metagraph factorization. TKDD, 5(3):17, 2011. Google ScholarDigital Library
- G. Mateos and G. B. Giannakis. Robust pca as bilinear decomposition with outlier-sparsity regularization. IEEE Transactions on Signal Processing, 60(10):5176--5190, 2012. Google ScholarDigital Library
- Y. Matsubara and Y. Sakurai. Regime shifts in streams: Real-time forecasting of co-evolving time sequences. In KDD, 2016. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, and C. Faloutsos. Autoplait: Automatic mining of co-evolving time sequences. In SIGMOD, pages 193--204, 2014. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, and C. Faloutsos. The web as a jungle: Non-linear dynamical systems for co-evolving online activities. In WWW, pages 721--731, 2015. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, and C. Faloutsos. Non-linear mining of competing local activities. In WWW, pages 737--747, 2016. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, W. G. van Panhuis, and C. Faloutsos. Funnel: automatic mining of spatially coevolving epidemics. In KDD, pages 105--114, 2014. Google ScholarDigital Library
- E. E. Papalexakis. Automatic unsupervised tensor mining with quality assessment. In SDM, 2016.Google ScholarCross Ref
- E. E. Papalexakis and C. Faloutsos. Fast efficient and scalable core consistency diagnostic for the parafac decomposition for big sparse tensors. In ICASSP, pages 5441--5445, 2015.Google ScholarCross Ref
- M. Rogers, L. Li, and S. J. Russell. Multilinear dynamical systems for tensor time series. In NIPS, pages 2634--2642, 2013. Google ScholarDigital Library
- P. Wang, H. Wang, and W. Wang. Finding semantics in time series. In SIGMOD, pages 385--396, 2011. Google ScholarDigital Library
- Y. Wang, R. Chen, J. Ghosh, J. C. Denny, A. Kho, Y. Chen, B. A. Malin, and J. Sun. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In KDD, pages 1265--1274, 2015. Google ScholarDigital Library
- J. Yang, J. McAuley, J. Leskovec, P. LePendu, and N. Shah. Finding progression stages in time-evolving event sequences. In WWW, pages 783--794, 2014. Google ScholarDigital Library
- L. Ye and E. Keogh. Time series shapelets: a new primitive for data mining. In KDD, pages 947--956, 2009. Google ScholarDigital Library
- M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49--67, 2006.Google Scholar
Index Terms
- AutoCyclone: Automatic Mining of Cyclic Online Activities with Robust Tensor Factorization
Recommendations
Short-term daily precipitation forecasting with seasonally-integrated autoencoder
AbstractShort-term precipitation forecasting is essential for planning of human activities in multiple scales, ranging from individuals’ planning, urban management to flood prevention. Yet the short-term atmospheric dynamics are highly ...
Highlights- A deep learning model for short-term precipitation forecasting is proposed.
- The ...
Boosted Embeddings for Time-Series Forecasting
Machine Learning, Optimization, and Data ScienceAbstractTime-series forecasting is a fundamental task emerging from diverse data-driven applications. Many advanced autoregressive methods such as ARIMA were used to develop forecasting models. Recently, deep learning based methods such as DeepAR, ...
Time-Series Anomaly Detection Service at Microsoft
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningLarge companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series ...
Comments