Article

Finding simple intensity descriptions from event sequence data

Authors:
Heikki Mannila

Nokia Research Center, FIN-00045 Nokia Group, Finland

Nokia Research Center, FIN-00045 Nokia Group, Finland
View Profile

,
Marko Salmenkivi

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 341–346https://doi.org/10.1145/502512.502562

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 341–346

ABSTRACT

Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in O(n²k) time for a sequence of n events and k different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods.

References

1.E. Arjas. Survival models and martingale dynamics. Scandinavian Journal of Statistics, 16:177-225, 1989.Google Scholar
2.E. Arjas and J. Heikkinen. An Algorithm for nonparametric Bayesian estimation of a Poisson intensity. Computational Statistics, 12:385-402, 1997.Google Scholar
3.D. Hawkins. Point estimation of parameters of piecewise regression models. Journal of The Royal Statistical Society Series C, 25(1):51-57, 1976.Google Scholar
4.M. Eerola, H. Mannila, and M. Salmenkivi. Frailty factors and time-dependent hazards in modeling ear infections in children using Bassist. In Prec. of XIII Symposium on Computational Statistics, pages 287-292, Bristol, June 1998.Google Scholar
5.P. Green. Reversible jump Marker chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732, 1995.Google ScholarCross Ref
6.P. Guttorp. Stochastic modeling of scientific data. Chapman and Hall, London, 1995.Google ScholarCross Ref
7.M. Klemettinen, H. Mannila, and H. Toivonen. Interactice exploration of interesting findings in TASA. Information and Software Technology, Special Issue on Knowledge Discovery and Data Mining, 1999.Google Scholar
8.L. Tierney. Markov chains for exploring posterior distributions. Annals of Statistics, 22(4):1701-1728, 1994.Google ScholarCross Ref
9.S. Chib and E. Greenberg. Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327-335, 1995.Google Scholar
10.V. Guralnik and J. Srivastava. Event detection from time series data. In Prec. of the 5th International Conference in Knowledge discovery and Data Mining, San Diego, August 1999. Google ScholarDigital Library
11.D. W.Gilks, S.Richardson. Marker chain Monte Carlo in practice. Chapman and Hall, London, 1996.Google Scholar

Index Terms

Finding simple intensity descriptions from event sequence data
1. Information systems
  1. Information systems applications
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Using Markov chain Monte Carlo and dynamic programming for event sequence data

Sequences of events are a common type of data in various scientific and business applications, e.g. telecommunication network management, study of web access logs, biostatistics and epidemiology. A natural approach to modelling event sequences is using ...
Read More
An index-based method for timestamped event sequence matching
DEXA'05: Proceedings of the 16th international conference on Database and Expert Systems Applications

This paper addresses the problem of timestamped event sequence matching, a new type of sequence matching that retrieves the occurrences of interesting patterns from a timestamped event sequence. Timestamped event sequence matching is useful for ...
Read More
Mining Episode Rules from Event Sequences Under Non-overlapping Frequency
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices
Abstract
Frequent episode mining is a popular framework for retrieving useful information from an event sequence. Many algorithms have been proposed to mine frequent episodes and to derive episode rules from them with respect to a given frequency function ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MCMC
event sequence
intensity modeling
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 566
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding simple intensity descriptions from event sequence data

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using Markov chain Monte Carlo and dynamic programming for event sequence data

An index-based method for timestamped event sequence matching

Mining Episode Rules from Event Sequences Under Non-overlapping Frequency