ABSTRACT
Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in O(n2k) time for a sequence of n events and k different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods.
- 1.E. Arjas. Survival models and martingale dynamics. Scandinavian Journal of Statistics, 16:177-225, 1989.Google Scholar
- 2.E. Arjas and J. Heikkinen. An Algorithm for nonparametric Bayesian estimation of a Poisson intensity. Computational Statistics, 12:385-402, 1997.Google Scholar
- 3.D. Hawkins. Point estimation of parameters of piecewise regression models. Journal of The Royal Statistical Society Series C, 25(1):51-57, 1976.Google Scholar
- 4.M. Eerola, H. Mannila, and M. Salmenkivi. Frailty factors and time-dependent hazards in modeling ear infections in children using Bassist. In Prec. of XIII Symposium on Computational Statistics, pages 287-292, Bristol, June 1998.Google Scholar
- 5.P. Green. Reversible jump Marker chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732, 1995.Google ScholarCross Ref
- 6.P. Guttorp. Stochastic modeling of scientific data. Chapman and Hall, London, 1995.Google ScholarCross Ref
- 7.M. Klemettinen, H. Mannila, and H. Toivonen. Interactice exploration of interesting findings in TASA. Information and Software Technology, Special Issue on Knowledge Discovery and Data Mining, 1999.Google Scholar
- 8.L. Tierney. Markov chains for exploring posterior distributions. Annals of Statistics, 22(4):1701-1728, 1994.Google ScholarCross Ref
- 9.S. Chib and E. Greenberg. Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327-335, 1995.Google Scholar
- 10.V. Guralnik and J. Srivastava. Event detection from time series data. In Prec. of the 5th International Conference in Knowledge discovery and Data Mining, San Diego, August 1999. Google ScholarDigital Library
- 11.D. W.Gilks, S.Richardson. Marker chain Monte Carlo in practice. Chapman and Hall, London, 1996.Google Scholar
Index Terms
- Finding simple intensity descriptions from event sequence data
Recommendations
Using Markov chain Monte Carlo and dynamic programming for event sequence data
Sequences of events are a common type of data in various scientific and business applications, e.g. telecommunication network management, study of web access logs, biostatistics and epidemiology. A natural approach to modelling event sequences is using ...
An index-based method for timestamped event sequence matching
DEXA'05: Proceedings of the 16th international conference on Database and Expert Systems ApplicationsThis paper addresses the problem of timestamped event sequence matching, a new type of sequence matching that retrieves the occurrences of interesting patterns from a timestamped event sequence. Timestamped event sequence matching is useful for ...
Mining Episode Rules from Event Sequences Under Non-overlapping Frequency
Advances and Trends in Artificial Intelligence. Artificial Intelligence PracticesAbstractFrequent episode mining is a popular framework for retrieving useful information from an event sequence. Many algorithms have been proposed to mine frequent episodes and to derive episode rules from them with respect to a given frequency function ...
Comments