skip to main content
10.1145/2872518.2891061acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
tutorial

Mining Big Time-series Data on the Web

Published: 11 April 2016 Publication History

Abstract

Online news, blogs, SNS and many other Web-based services has been attracting considerable interest for business and marketing purposes. Given a large collection of time series, such as web-click logs, online search queries, blog and review entries, how can we efficiently and effectively find typical time-series patterns? What are the major tools for mining, forecasting and outlier detection? Time-series data analysis is becoming of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-line processing capability.
The objective of this tutorial is to provide a concise and intuitive overview of the most important tools that can help us find meaningful patterns in large-scale time-series data. Specifically we review the state of the art in three related fields: (1) similarity search, pattern discovery and summarization, (2) non-linear modeling and forecasting, and (3) the extension of time-series mining and tensor analysis. We also introduce case studies that illustrate their practical use for social media and Web-based services.

References

[1]
R. M. Anderson and R. M. May. Infectious Diseases of Humans Dynamics and Control. Oxford University Press, 1992.
[2]
A. L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435, 2005.
[3]
F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215--227, 1969.
[4]
A. Beutel, B. A. Prakash, R. Rosenfeld, and C. Faloutsos. Interacting viruses in networks: can both survive? In KDD, pages 426--434, 2012.
[5]
G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994.
[6]
F. Brauer and C. Castillo-Chavez. Mathematical models in population biology and epidemiology, volume 40. Springer Verlag, New York, 2001.
[7]
P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer-Verlag New York, Inc., New York, NY, USA, 1987.
[8]
D. Chakrabarti and C. Faloutsos. F4: large-scale automated forecasting using fractals. In CIKM, pages 2--9, 2002.
[9]
R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. In PNAS, 2008.
[10]
D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
[11]
C. Faloutsos and K.-I. Lin. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In SIGMOD, pages 163--174, 1995.
[12]
F. Figueiredo, J. M. Almeida, Y. Matsubara, B. Ribeiro, and C. Faloutsos. Revisit behavior in social media: The phoenix-r model and discoveries. In PKDD, pages 386--401, 2014.
[13]
Y. Fujiwara, Y. Sakurai, and M. Yamamuro. Spiral: efficient and exact model identification for hidden markov models. In KDD, pages 247--255, 2008.
[14]
A. Jain, E. Y. Chang, and Y.-F. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, pages 11--22, 2004.
[15]
K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In ICDM 2001: Proceeding of 2001 IEEE International Conference on Data Mining, pages 273--280, 2001.
[16]
E. J. Keogh. Exact indexing of dynamic time warping. In Proceedings of VLDB, pages 406--417, Hong Kong, China, August 2002.
[17]
E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In Proceedings of ACM SIGMOD, pages 151--162, May 2001.
[18]
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM, pages 289--296, 2001.
[19]
E. J. Keogh, T. Palpanas, V. B. Zordan, D. Gunopulos, and M. Cardle. Indexing large human-motion databases. In VLDB, pages 780--791, 2004.
[20]
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009.
[21]
T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, pages 242--249, 2005.
[22]
R. Kumar, M. Mahdian, and M. McGlohon. Dynamics of conversations. In KDD, pages 553--562, 2010.
[23]
L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253--1278, 2000.
[24]
J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD, pages 593--604, 2007.
[25]
J. Leskovec, L. Backstrom, and J. M. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, pages 497--506, 2009.
[26]
J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In KDD, pages 462--470, 2008.
[27]
J. Letchner, C. Ré, M. Balazinska, and M. Philipose. Access methods for markovian streams. In ICDE, pages 246--257, 2009.
[28]
L. Li, C.-J. M. Liang, J. Liu, S. Nath, A. Terzis, and C. Faloutsos. Thermocast: A cyber-physical forecasting model for data centers. In KDD, 2011.
[29]
L. Li, J. McCann, N. Pollard, and C. Faloutsos. Dynammo: Mining and summarization of coevolving sequences with missing values. In KDD, 2009.
[30]
L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010.
[31]
Y. Matsubara, L. Li, E. E. Papalexakis, D. Lo, Y. Sakurai, and C. Faloutsos. F-trail: Finding patterns in taxi trajectories. In PAKDD, pages 86--98, 2013.
[32]
Y. Matsubara, Y. Sakurai, and C. Faloutsos. Autoplait: automatic mining of co-evolving time sequences. In SIGMOD, pages 193--204, 2014.
[33]
Y. Matsubara, Y. Sakurai, and C. Faloutsos. The web as a jungle: Non-linear dynamical systems for co-evolving online activities. In WWW, 2015.
[34]
Y. Matsubara, Y. Sakurai, and C. Faloutsos. Non-linear mining of competing local activities. In WWW, 2016.
[35]
Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In KDD, pages 271--279, 2012.
[36]
Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In KDD, pages 6--14, 2012.
[37]
Y. Matsubara, Y. Sakurai, N. Ueda, and M. Yoshikawa. Fast and exact monitoring of co-evolving data streams. In ICDM, 2014.
[38]
Y. Matsubara, Y. Sakurai, W. G. van Panhuis, and C. Faloutsos. FUNNEL: automatic mining of spatially coevolving epidemics. In KDD, pages 105--114, 2014.
[39]
Y. Matsubara, Y. Sakurai, and M. Yoshikawa. Scalable algorithms for distribution search. In ICDM, pages 347--356, 2009.
[40]
R. M. May. Qualitative stability in model ecosystems. Ecology, 54(3):638--641, 1973.
[41]
M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, and N. Glance. Finding patterns in blog shapes and blog evolution. In International Conference on Weblogs and Social Media, Boulder, Colo., March 2007.
[42]
M. Nowak. Evolutionary Dynamics. Harvard University Press, 2006.
[43]
J.-Y. Pan, H. Kitagawa, C. Faloutsos, and M. Hamamoto. Autosplit: Fast and scalable discovery of hidden variables in stream and multimedia databases. In PAKDD, May 26--28 2004.
[44]
S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. In VLDB, pages 560--571, 2003.
[45]
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, pages 697--708, 2005.
[46]
S. Papadimitriou and P. S. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD, pages 647--658, 2006.
[47]
B. A. Prakash, A. Beutel, R. Rosenfeld, and C. Faloutsos. Winner takes all: competing viruses or ideas on fair-play networks. In WWW, pages 1037--1046, 2012.
[48]
B. A. Prakash, D. Chakrabarti, M. Faloutsos, N. Valler, and C. Faloutsos. Threshold conditions for arbitrary cascade models on arbitrary networks. In ICDM, pages 537--546, 2011.
[49]
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012.
[50]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In ICDE, pages 1046--1055, Istanbul, Turkey, April 2007.
[51]
Y. Sakurai, L. Li, Y. Matsubara, and C. Faloutsos. Windmine: Fast and effective mining of web-click sequences. In SDM, pages 759--770, 2011.
[52]
Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In SIGMOD, pages 599--610, 2005.
[53]
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. Ftw: Fast similarity search under the time warping distance. In PODS, pages 326--337, Baltimore, Maryland, June 2005.
[54]
M. Schroeder. Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. W. H. Freeman, 1991.
[55]
J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, pages 374--383, 2006.
[56]
Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction and indexing of moving objects with unknown motion patterns. In SIGMOD, pages 611--622, 2004.
[57]
M. Toyoda, Y. Sakurai, and Y. Ishikawa. Pattern discovery in data streams under the time warping distance. VLDB J., 22(3):295--318, 2013.
[58]
W. G. van Panhuis, J. Grefenstette, S. Y. Jung, N. S. Chok, A. Cross, H. Eng, B. Y. Lee, V. Zadorozhny, S. Brown, D. Cummings, and D. S. Burke. Contagious diseases in the united states from 1888 to the present. NEJM, 369(22):2152--2158, 2013.
[59]
P. Wang, H. Wang, and W. Wang. Finding semantics in time series. In SIGMOD Conference, pages 385--396, 2011.
[60]
J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11):1870--1878, 1990.
[61]
J. Yang and J. Leskovec. Modeling information diffusion in implicit networks. In ICDM, pages 599--608, 2010.
[62]
J. Yang and J. Leskovec. Patterns of temporal variation in online media. In WSDM, pages 177--186, 2011.
[63]
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE, pages 13--22, 2000.
[64]
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358--369, 2002.

Cited By

View all
  • (2022)Scalable Analytics on Large Sequence Collections2022 23rd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM55031.2022.00022(5-8)Online publication date: Jun-2022
  • (2019)Dynamic Modeling and Forecasting of Time-evolving Data StreamsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330947(458-468)Online publication date: 25-Jul-2019
  • (2018)StreamScopeProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211959(1-8)Online publication date: 10-Jun-2018
  • Show More Cited By

Index Terms

  1. Mining Big Time-series Data on the Web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
    April 2016
    1094 pages
    ISBN:9781450341448
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 11 April 2016

    Check for updates

    Author Tags

    1. forecasting
    2. pattern discovery
    3. tensors
    4. time-series

    Qualifiers

    • Tutorial

    Funding Sources

    • Japan Society for the Promotion of Science
    • National Science Foundation

    Conference

    WWW '16
    Sponsor:
    • IW3C2
    WWW '16: 25th International World Wide Web Conference
    April 11 - 15, 2016
    Québec, Montréal, Canada

    Acceptance Rates

    WWW '16 Companion Paper Acceptance Rate 115 of 727 submissions, 16%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Scalable Analytics on Large Sequence Collections2022 23rd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM55031.2022.00022(5-8)Online publication date: Jun-2022
    • (2019)Dynamic Modeling and Forecasting of Time-evolving Data StreamsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330947(458-468)Online publication date: 25-Jul-2019
    • (2018)StreamScopeProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211959(1-8)Online publication date: 10-Jun-2018
    • (2017)Nonlinear Dynamics of Information Diffusion in Social NetworksACM Transactions on the Web10.1145/305774111:2(1-40)Online publication date: 24-Apr-2017
    • (2017)Ecosystem on the WebWorld Wide Web10.1007/s11280-016-0389-x20:3(439-465)Online publication date: 1-May-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media