skip to main content
10.1145/2723372.2731081acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Mining and Forecasting of Big Time-series Data

Published: 27 May 2015 Publication History

Abstract

Given a large collection of time series, such as web-click logs, electric medical records and motion capture sensors, how can we efficiently and effectively find typical patterns? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? What are the major tools for forecasting and outlier detection? Time-series data analysis is becoming of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-line processing capability.
The objective of this tutorial is to provide a concise and intuitive overview of the most important tools that can help us find patterns in large-scale time-series sequences. We review the state of the art in four related fields: (1) similarity search and pattern discovery, (2) linear modeling and summarization, (3) non-linear modeling and forecasting, and (4) the extension of time-series mining and tensor analysis. The emphasis of the tutorial is to provide the intuition behind these powerful tools, which is usually lost in the technical literature, as well as to introduce case studies that illustrate their practical use.

References

[1]
R. M. Anderson and R. M. May. Infectious Diseases of Humans Dynamics and Control. Oxford University Press, 1992.
[2]
A. L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435, 2005.
[3]
F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215--227, 1969.
[4]
G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Forecasting and Control Series. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994.
[5]
F. Brauer and C. Castillo-Chavez. Mathematical models in population biology and epidemiology, volume 40. Springer Verlag, New York, 2001.
[6]
P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer-Verlag New York, Inc., New York, NY, USA, 1987.
[7]
D. Chakrabarti and C. Faloutsos. F4: large-scale automated forecasting using fractals. In CIKM, pages 2--9, 2002.
[8]
R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. In PNAS, 2008.
[9]
D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
[10]
C. Faloutsos and K.-I. Lin. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In SIGMOD, pages 163--174, 1995.
[11]
F. Figueiredo, J. M. Almeida, Y. Matsubara, B. Ribeiro, and C. Faloutsos. Revisit behavior in social media: The phoenix-r model and discoveries. In PKDD, pages 386--401, 2014.
[12]
Y. Fujiwara, Y. Sakurai, and M. Yamamuro. Spiral: efficient and exact model identification for hidden markov models. In KDD, pages 247--255, 2008.
[13]
J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer, M. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457:1012--1014, 2009.
[14]
M. D. Hoffman, D. M. Blei, and F. R. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856--864, 2010.
[15]
A. Jain, E. Y. Chang, and Y.-F. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, pages 11--22, 2004.
[16]
K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In ICDM 2001: Proceeding of 2001 IEEE International Conference on Data Mining, pages 273--280, 2001.
[17]
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM, pages 289--296, 2001.
[18]
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009.
[19]
T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, pages 242--249, 2005.
[20]
L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253--1278, 2000.
[21]
J. Leskovec, L. Backstrom, and J. M. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, pages 497--506, 2009.
[22]
J. Letchner, C. Ré, M. Balazinska, and M. Philipose. Access methods for markovian streams. In ICDE, pages 246--257, 2009.
[23]
L. Li, C.-J. M. Liang, J. Liu, S. Nath, A. Terzis, and C. Faloutsos. Thermocast: A cyber-physical forecasting model for data centers. In KDD, 2011.
[24]
L. Li, J. McCann, N. Pollard, and C. Faloutsos. Dynammo: Mining and summarization of coevolving sequences with missing values. In KDD, 2009.
[25]
L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010.
[26]
M. Mathioudakis, N. Koudas, and P. Marbach. Early online identification of attention gathering items in social media. In WSDM, pages 301--310, 2010.
[27]
Y. Matsubara, L. Li, E. E. Papalexakis, D. Lo, Y. Sakurai, and C. Faloutsos. F-trail: Finding patterns in taxi trajectories. In PAKDD, pages 86--98, 2013.
[28]
Y. Matsubara, Y. Sakurai, and C. Faloutsos. Autoplait: automatic mining of co-evolving time sequences. In SIGMOD, pages 193--204, 2014.
[29]
Y. Matsubara, Y. Sakurai, and C. Faloutsos. The web as a jungle: Non-linear dynamical systems for co-evolving online activities. In WWW, 2015.
[30]
Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In KDD, pages 271--279, 2012.
[31]
Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In KDD, pages 6--14, 2012.
[32]
Y. Matsubara, Y. Sakurai, N. Ueda, and M. Yoshikawa. Fast and exact monitoring of co-evolving data streams. In ICDM, 2014.
[33]
Y. Matsubara, Y. Sakurai, W. G. van Panhuis, and C. Faloutsos. FUNNEL: automatic mining of spatially coevolving epidemics. In KDD, pages 105--114, 2014.
[34]
Y. Matsubara, Y. Sakurai, and M. Yoshikawa. Scalable algorithms for distribution search. In ICDM, pages 347--356, 2009.
[35]
R. M. May. Qualitative stability in model ecosystems. Ecology, 54(3):638--641, 1973.
[36]
M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, and N. Glance. Finding patterns in blog shapes and blog evolution. In International Conference on Weblogs and Social Media, Boulder, Colo., March 2007.
[37]
M. Nowak. Evolutionary Dynamics. Harvard University Press, 2006.
[38]
J.-Y. Pan, H. Kitagawa, C. Faloutsos, and M. Hamamoto. Autosplit: Fast and scalable discovery of hidden variables in stream and multimedia databases. In PAKDD, May 26--28 2004.
[39]
S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. In VLDB, pages 560--571, 2003.
[40]
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, pages 697--708, 2005.
[41]
S. Papadimitriou and P. S. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD, pages 647--658, 2006.
[42]
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012.
[43]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, April 15-20, 2007, The Marmara Hotel, Istanbul, Turkey, pages 1046--1055, 2007.
[44]
Y. Sakurai, L. Li, Y. Matsubara, and C. Faloutsos. Windmine: Fast and effective mining of web-click sequences. In SDM, pages 759--770, 2011.
[45]
Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In SIGMOD, pages 599--610, 2005.
[46]
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. Ftw: Fast similarity search under the time warping distance. In PODS, pages 326--337, Baltimore, Maryland, June 2005.
[47]
J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, pages 374--383, 2006.
[48]
Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction and indexing of moving objects with unknown motion patterns. In SIGMOD, pages 611--622, 2004.
[49]
M. Toyoda, Y. Sakurai, and Y. Ishikawa. Pattern discovery in data streams under the time warping distance. VLDB J., 22(3):295--318, 2013.
[50]
W. G. van Panhuis, J. Grefenstette, S. Y. Jung, N. S. Chok, A. Cross, H. Eng, B. Y. Lee, V. Zadorozhny, S. Brown, D. Cummings, and D. S. Burke. Contagious diseases in the united states from 1888 to the present. NEJM, 369(22):2152--2158, 2013.
[51]
P. Wang, H. Wang, and W. Wang. Finding semantics in time series. In SIGMOD Conference, pages 385--396, 2011.
[52]
J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11):1870--1878, 1990.
[53]
J. Yang and J. Leskovec. Modeling information diffusion in implicit networks. In ICDM, pages 599--608, 2010.
[54]
J. Yang and J. Leskovec. Patterns of temporal variation in online media. In WSDM, pages 177--186, 2011.
[55]
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE, pages 13--22, 2000.
[56]
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358--369, 2002.

Cited By

View all
  • (2025)Fuzzy Clustering of Circular Time Series With Applications to Wind DataEnvironmetrics10.1002/env.290236:2Online publication date: 10-Feb-2025
  • (2023)TVAProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620539(5395-5412)Online publication date: 9-Aug-2023
  • (2022)Fast Mining and Forecasting of Co-evolving Epidemiological Data StreamsProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539078(3157-3167)Online publication date: 14-Aug-2022
  • Show More Cited By

Index Terms

  1. Mining and Forecasting of Big Time-series Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. forecasting
    2. pattern discovery
    3. tensors
    4. time-series

    Qualifiers

    • Research-article

    Funding Sources

    • JSPS
    • NFS
    • ARL

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Fuzzy Clustering of Circular Time Series With Applications to Wind DataEnvironmetrics10.1002/env.290236:2Online publication date: 10-Feb-2025
    • (2023)TVAProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620539(5395-5412)Online publication date: 9-Aug-2023
    • (2022)Fast Mining and Forecasting of Co-evolving Epidemiological Data StreamsProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539078(3157-3167)Online publication date: 14-Aug-2022
    • (2021)Grasping Inter-Attribute and Temporal Variability in Multivariate Time SeriesIEEE Transactions on Big Data10.1109/TBDATA.2019.29188077:5(885-892)Online publication date: 1-Nov-2021
    • (2021)Let's do the time warp again: non‐linear time series matching as a tool for sequentially structured data in ecologyEcosphere10.1002/ecs2.374212:9Online publication date: 9-Sep-2021
    • (2020)Feature-aware forecasting of large-scale time series data setsit - Information Technology10.1515/itit-2019-003562:3-4(157-168)Online publication date: 6-Mar-2020
    • (2020)Non-Linear Mining of Social Activities in Tensor StreamsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403260(2093-2102)Online publication date: 23-Aug-2020
    • (2020)Cost-Aware Multimedia Data Allocation for Heterogeneous Memory Using Genetic Algorithm in Cloud ComputingIEEE Transactions on Cloud Computing10.1109/TCC.2016.25941728:4(1212-1222)Online publication date: 1-Oct-2020
    • (2020)Driving with Data in the Motor City: Understanding and Predicting Fleet Maintenance Patterns2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA49011.2020.00052(380-389)Online publication date: Oct-2020
    • (2019)Automatic Sequential Pattern Mining in Data StreamsProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358002(1733-1742)Online publication date: 3-Nov-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media