research-article

Constructing comprehensive summaries of large event sequences

Authors:
Jerry Kiernan

IBM, San Jose, CA, USA

IBM, San Jose, CA, USA
View Profile

,
Evimaria Terzi

IBM Almaden, San Jose, CA, USA

IBM Almaden, San Jose, CA, USA
View Profile

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2008Pages 417–425https://doi.org/10.1145/1401890.1401943

Published:24 August 2008Publication History

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 417–425

ABSTRACT

Event sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns. Though interesting, these patterns reveal local associations and fail to give a comprehensive summary of the entire event sequence. Moreover, the number of patterns discovered can be large. In this paper, we take an alternative approach and build short summaries that describe the entire sequence, while revealing local associations among events.

We formally define the summarization problem as an optimization problem that balances between shortness of the summary and accuracy of the data description. We show that this problem can be solved optimally in polynomial time by using a combination of two dynamic-programming algorithms. We also explore more efficient greedy alternatives and demonstrate that they work well on large datasets. Experiments on both synthetic and real datasets illustrate that our algorithms are efficient and produce high-quality results, and reveal interesting local structures in the data.

References

R. Agrawal and R. Srikant. Mining Sequential Patterns. In ICDE, 1995. Google ScholarDigital Library
R. Bellman. On the approximation of curves by line segments using dynamic programming. Communications of the ACM, 4(6), 1961. Google ScholarDigital Library
D. Chudova and P. Smyth. Pattern discovery in sequences under a markov assumption. In KDD, pages 153--162, 2002. Google ScholarDigital Library
S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In STOC, pages 471--475, 2001. Google ScholarDigital Library
P. Karras, D. Sacharidis, and N. Mamoulis. Exploiting duality in summarization with deterministic guarantees. In KDD, pages 380--389, 2007. Google ScholarDigital Library
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM, pages 289--296, 2001. Google ScholarDigital Library
P. Kilpeläinen, H. Mannila, and E. Ukkonen. Mdl learning of unions of simple pattern languages from positive examples. In EuroCOLT, pages 252--260, 1995. Google ScholarDigital Library
M. Koivisto, M. Perola, T. Varilo, et al. An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In Pacific Symposium on Biocomputing, pages 502--513, 2003.Google Scholar
H. Mannila and M. Salmenkivi. Finding simple intensity descriptions from event sequence data. In KDD, pages 341--346, 2001. Google ScholarDigital Library
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In KDD , pages 146--151, 1996.Google Scholar
M. Mehta, J. Rissanen, and R. Agrawal. Mdl-based decision tree pruning. In KDD, pages 216--221, 1995.Google Scholar
S. Papadimitriou and P. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD , pages 647--658, 2006. Google ScholarDigital Library
J. Pei, J. Han, and W. Wang. Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst., 28(2):133--160, 2007. Google ScholarDigital Library
L. R. Rabiner and B. H. Juang. An introduction to Hidden Markov Models. IEEE ASSP Magazine, pages 4--15, January 1986.Google ScholarCross Ref
J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.Google ScholarDigital Library
J. Rissanen. Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc., River Edge, NJ, USA, 1989. Google ScholarDigital Library
Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In SIGMOD, pages 599--610, 2005. Google ScholarDigital Library
E. Terzi and P. Tsaparas. Efficient algorithms for sequence segmentation. In SDM, 2006.Google ScholarCross Ref
J. Yang, W. Wang, P. S. Yu, and J. Han. Mining long sequential patterns in a noisy environment. In SIGMOD, pages 406--417, 2002. Google ScholarDigital Library
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358--369, 2002. Google ScholarDigital Library

Index Terms

Constructing comprehensive summaries of large event sequences

Recommendations

Constructing comprehensive summaries of large event sequences

Event sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns appearing in a sequence. While interesting, these patterns do not give a comprehensive summary of the entire ...
Read More
Discovery of Frequent Episodes in Event Sequences

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering ...
Read More
Finding progression stages in time-evolving event sequences
WWW '14: Proceedings of the 23rd international conference on World wide web

Event sequences, such as patients' medical histories or users' sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic programming
event sequences
log mining
minimum description length
summarization
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 537
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Constructing comprehensive summaries of large event sequences

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Constructing comprehensive summaries of large event sequences

Discovery of Frequent Episodes in Event Sequences

Finding progression stages in time-evolving event sequences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Constructing comprehensive summaries of large event sequences

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Constructing comprehensive summaries of large event sequences

Discovery of Frequent Episodes in Event Sequences

Finding progression stages in time-evolving event sequences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media