skip to main content
10.1145/2588555.2588556acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

AutoPlait: automatic mining of co-evolving time sequences

Published: 18 June 2014 Publication History

Abstract

Given a large collection of co-evolving multiple time-series, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? In this paper we present AutoPlait, a fully automatic mining algorithm for co-evolving time sequences. Our method has the following properties: (a) effectiveness: it operates on large collections of time-series, and finds similar segment groups that agree with human intuition; (b) scalability: it is linear with the input size, and thus scales up very well; and (c) AutoPlait is parameter-free, and requires no user intervention, no prior training, and no parameter tuning. Extensive experiments on 67GB of real datasets demonstrate that AutoPlait does indeed detect meaningful patterns correctly, and it outperforms state-of-the-art competitors as regards accuracy and speed: AutoPlait achieves near-perfect, over 95% precision and recall, and it is up to 472 times faster than its competitors.

References

[1]
D. Agarwal, B.-C. Chen, and P. Elango. Spatio-temporal models for estimating click-through rate. In WWW, pages 21--30, 2009.
[2]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2006.
[3]
C. Böhm, C. Faloutsos, and C. Plant. Outlier-robust clustering using independent components. In SIGMOD, pages 185--198, 2008.
[4]
G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994.
[5]
D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004.
[6]
H. Chen, W.-S. Ku, H. Wang, and M.-T. Sun. Leveraging spatio-temporal redundancy for rfid data cleansing. In SIGMOD, pages 51--62, 2010.
[7]
L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004.
[8]
R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, 1999.
[9]
S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32(1):41--62, 1998.
[10]
E. Fox, E. Sudderth, M. Jordan, and A. Willsky. Bayesian Nonparametric Methods for Learning Markov Switching Processes. Signal Processing Magazine, IEEE, 27(6):43--54, 2010.
[11]
E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. Sharing features among dynamical systems with beta processes. In NIPS, pages 549--557, 2009.
[12]
Y. Fujiwara, Y. Sakurai, and M. Yamamuro. Spiral: efficient and exact model identification for hidden markov models. In KDD, pages 247--255, 2008.
[13]
A. Jain, E. Y. Chang, and Y.-F. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, pages 11--22, 2004.
[14]
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM, pages 289--296, 2001.
[15]
J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD Conference, pages 593--604, 2007.
[16]
J. Letchner, C. Ré, M. Balazinska, and M. Philipose. Access methods for markovian streams. In ICDE, pages 246--257, 2009.
[17]
L. Li, C.-J. M. Liang, J. Liu, S. Nath, A. Terzis, and C. Faloutsos. T hermocast: A cyber-physical forecasting model for data centers. In KDD, 2011.
[18]
L. Li, J. McCann, N. Pollard, and C. Faloutsos. Dynammo: Mining and summarization of coevolving sequences with missing values. In KDD, 2009.
[19]
L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010.
[20]
Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In KDD, pages 271--279, 2012.
[21]
Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In KDD, pages 6--14, 2012.
[22]
A. Mueen and E. J. Keogh. Online discovery and maintenance of time series motifs. In KDD, pages 1089--1098, 2010.
[23]
R. T. Ng and J. Han. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng., 14(5):1003--1016, 2002.
[24]
T. Palpanas, M. Vlachos, E. Keogh, and D. Gunopulos. Streaming time series summarization using user-defined amnesic functions. IEEE Transactions on Knowledge and Data Engineering, 20(7):992--1006, 2008.
[25]
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In Proceedings of VLDB, pages 697--708, Trondheim, Norway, August-September 2005.
[26]
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012.
[27]
J. Rissanen. A Universal Prior for Integers and Estimation by Minimum Description Length. Ann. of Statist., 11(2):416--431, 1983.
[28]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In ICDE, pages 1046--1055, 2007.
[29]
Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In SIGMOD, pages 599--610, 2005.
[30]
V. Shnayder, B.-r. Chen, K. Lorincz, T. R. F. F. Jones, and M. Welsh. Sensor networks for medical care. In SenSys, pages 314--314, 2005.
[31]
Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction and indexing of moving objects with unknown motion patterns. In Proceedings of ACM SIGMOD, pages 611--622, 2004.
[32]
N. Tatti and J. Vreeken. The long and the short of it: summarising event sequences with serial episodes. In KDD, pages 462--470, 2012.
[33]
M. Toyoda, Y. Sakurai, and Y. Ishikawa. Pattern discovery in data streams under the time warping distance. VLDB J., 22(3):295--318, 2013.
[34]
P. Wang, H. Wang, and W. Wang. Finding semantics in time series. In SIGMOD Conference, pages 385--396, 2011.
[35]
J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11):1870--1878, 1990.
[36]
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. In SIGMOD, pages 103--114. ACM, 1996.

Cited By

View all
  • (2025)ISSD: Indicator Selection for Time Series State DetectionProceedings of the ACM on Management of Data10.1145/37096983:1(1-25)Online publication date: 11-Feb-2025
  • (2024)Application of Online Automated Segmentation and Evaluation Method in Anomaly Detection at Rail Profile Based on Pattern Matching and Complex NetworksISIJ International10.2355/isijinternational.ISIJINT-2024-00364:10(1528-1537)Online publication date: 15-Aug-2024
  • (2024)Raising the ClaSS of Streaming Time Series SegmentationProceedings of the VLDB Endowment10.14778/3659437.365945017:8(1953-1966)Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. AutoPlait: automatic mining of co-evolving time sequences

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic mining
    2. time-series data

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)100
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)ISSD: Indicator Selection for Time Series State DetectionProceedings of the ACM on Management of Data10.1145/37096983:1(1-25)Online publication date: 11-Feb-2025
    • (2024)Application of Online Automated Segmentation and Evaluation Method in Anomaly Detection at Rail Profile Based on Pattern Matching and Complex NetworksISIJ International10.2355/isijinternational.ISIJINT-2024-00364:10(1528-1537)Online publication date: 15-Aug-2024
    • (2024)Raising the ClaSS of Streaming Time Series SegmentationProceedings of the VLDB Endowment10.14778/3659437.365945017:8(1953-1966)Online publication date: 1-Apr-2024
    • (2024)Detecting State Correlations between Heterogeneous Time SeriesProceedings of the 2024 2nd International Conference on Advances in Artificial Intelligence and Applications10.1145/3712623.3712645(131-137)Online publication date: 20-Dec-2024
    • (2024)E2Usd: Efficient-yet-effective Unsupervised State Detection for Multivariate Time SeriesProceedings of the ACM Web Conference 202410.1145/3589334.3645593(3010-3021)Online publication date: 13-May-2024
    • (2024)A Data-Driven Three-Stage Adaptive Pattern Mining Approach for Multi-Energy LoadsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.346277036:12(7455-7467)Online publication date: Dec-2024
    • (2024)Change Point Detection in Multi-Channel Time Series via a Time-Invariant RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334735636:12(7743-7756)Online publication date: Dec-2024
    • (2024)Static and Streaming Discovery of Maximal Linear Representation Between Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328727336:1(401-415)Online publication date: Jan-2024
    • (2024)Predictive Clustering of Vessel Behavior Based on Hierarchical Trajectory RepresentationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.344549625:12(19496-19506)Online publication date: Dec-2024
    • (2024)Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825388(4145-4152)Online publication date: 15-Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media