skip to main content
10.1145/1031171.1031208acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Framework and algorithms for trend analysis in massive temporal data sets

Published: 13 November 2004 Publication History

Abstract

Mining massive temporal data streams for significant trends, emerging buzz, and unusually high or low activity is an important problem with several commercial applications. In this paper, we propose a framework based on relational records and metric spaces to study such problems. Our framework provides the necessary mathematical underpinnings for this genre of problems, and leads to efficient algorithms in the stream/sort model of massive data sets (where the algorithm makes passes over the data, computes a new stream on the fly, and is allowed to sort the intermediate data). Our algorithm makes novel use of metric approximations in the data stream context, and highlights the role of hierarchical organization of large data sets in designing efficient algorithms in the stream/sort model.

References

[1]
C. Aggarwal. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 575--586, 2003.
[2]
G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl. On models for massive data set computations, 2003. Manuscript.
[3]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 94--105, 1998.
[4]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 1994 Internationall Conference on Very Large Data Bases, pages 487--499, 1994.
[5]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, 1999.
[6]
Y. Bartal. Probabilistic approximations of metric spaces and its algorithmic applications. In Proceedings of the 37t Annual Symposium on Foundations of Computer Science, pages 184--193, 1996.
[7]
Y. Bartal. On approximating arbitrary metrices by tree metrics. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, pages 161--168, 1998.
[8]
G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivatsava. Finding Hierarchical Heavy Hitters in Data Streams. In Proceedings of the 29th VLDB Conference, 2003.
[9]
P. Domingos and G. Hulten. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 71--80, 2000.
[10]
P. Domingos and G. Hulten. Catching up with the data: Research issues in mining data streams. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.
[11]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. SIAM Journal on Computing, 32:131--151, 2002.
[12]
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. ACM SIGKDD Explorations, 3(2):1--10, 2002.
[13]
A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 389--398, 2002.
[14]
A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), pages 79--88, 2001.
[15]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001.
[16]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 97--106, 2001.
[17]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 91--101, 2002.
[18]
V.S. Lakshmanan, R.T. Ng, C.X. Wang, X. Zhou, and T.J. Johnson. The generalized MDL approach for summarization. In Proceedings of the 2002 Internationall Conference on Very Large Data Bases, 2002.
[19]
Y. Li, P. Ning, X.S. Wang, and S. Jajodia. Generating market basket data with temporal information. In KDD 2001 Workshop on Temporal Data Mining, 2001.
[20]
Standard & Poor's. See http://www.standardandpoors.com.
[21]
J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In Proceedings of the 17th International Conference on Data Engineering, pages 51--60, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream algorithms
  2. hierarchically partitioned data
  3. metric approximations
  4. taxonomies
  5. trend analysis

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 8 - 13, 2004
D.C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)HST+: An Efficient Index for Embedding Arbitrary Metric Spaces2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00062(648-659)Online publication date: Apr-2021
  • (2019)Last-mile delivery made practicalProceedings of the VLDB Endowment10.14778/3368289.336829713:3(320-333)Online publication date: 1-Nov-2019
  • (2019)ITISS: an efficient framework for querying big temporal dataGeoInformatica10.1007/s10707-019-00362-1Online publication date: 22-May-2019
  • (2018)Distributed In-Memory Analytics for Big Temporal DataDatabase Systems for Advanced Applications10.1007/978-3-319-91452-7_36(549-565)Online publication date: 13-May-2018
  • (2014)Fast and accurate detection of changes in data streamsStatistical Analysis and Data Mining10.1002/sam.112167:2(125-139)Online publication date: 1-Apr-2014
  • (2011)Heuristic methods for automating event detection on sensor data in near real-time2011 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)10.1109/COGSIMA.2011.5753446(1-8)Online publication date: Feb-2011
  • (2009)Automatic detection of trends in time-stamped sequences: an evolutionary approachSoft Computing10.1007/s00500-008-0395-814:3(211-227)Online publication date: 14-Jan-2009
  • (2007)Mining evolving data streams for frequent patternsPattern Recognition10.1016/j.patcog.2006.03.00640:2(492-503)Online publication date: 1-Feb-2007
  • (2007)A Survey of Change Diagnosis Algorithms in Evolving Data StreamsData Streams10.1007/978-0-387-47534-9_5(85-102)Online publication date: 2007
  • (2005)Multi-structural databasesProceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1065167.1065191(184-195)Online publication date: 13-Jun-2005
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media