Summary
In recent years, many techniques have been proposed to discover sequential patterns with temporal constraints from long sequences of categorical events. Central to these techniques is the concept of pattern occurrence. A rich set of measures based on pattern occurrence has been developed for sequential pattern discovery, filtering and ranking. Often, recurrences of patterns within the same sequence are ignored. However, the total number of pattern occurrences in each individual sequence can provide valuable insights, especially for applications with long sequences each containing many events.
In this chapter, we propose to use the maximum cardinality of all disjoint occurrence sets as the number of pattern occurrences in an individual sequence. In contrast to previous proposals, our de.nition (1) depends only on the sequence and the pattern, without requiring additional parameters such as a sliding window size; (2) ensures that patterns occur more often if their temporal constraints are relaxed; and (3) enables us to easily estimate the expected number of pattern occurrences, so that patterns whose counted occurrences deviate signi.cantly from their expectations can be regarded as surprising ones.
In addition, we (1) describe a greedy algorithm which e.ciently identi.es a single disjoint occurrence set that has the maximum cardinality; (2) develop a formula to calculate the expected value under a uniform distribution assumption; and (3) design an approximation process to e.ciently estimate the expected value. Our experiments show that the greedy algorithm is e.cient and the approximation is accurate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In The Eleventh International Conference on Data Engineering, pages 3–14, Taipei, Taiwan, 1995.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Jiawei Han, Laks V. S. Lakshmanan, and Jian Pei. Scalable frequent-pattern mining methods: An overview. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Tutorial Notes, San Francisco, California, 2001.
Jiawei Han, Jian Pei, and Y. Yin. Mining frequent patterns without candidate generation. In 2000 ACM-SIGMOD International Conference on Management of Data, Dallas, Texas, 2000.
Mahesh Joshi, George Karypis, and Vipin Kumar. A universal formulation of sequential patterns. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Temporal Data Mining, San Francisco, California, 2001.
Kai-Sang Leung, Raymond T. Ng, and Heikki Mannila. OSSM: A segmentation approach to optimize frequency counting. In 18th International Conference on Data Engineering, San Jose, California, 2002.
Heikki Mannila and Hannu Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 146–151, Portland, Oregon, 1996.
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 210–215, Montreal, Canada, 1995.
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland, 1997. Series of Publications C.
Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. PrefixSpan: Mining sequential patterns by prefix-projected growth. In 17th International Conference on Data Engineering, pages 215–224, Heidelberg, Germany, 2001.
Jian Pei, Jiawei Han, and Wei Wang. Mining sequential patterns with constraints in large databases. In Eleventh International Conference on Information and Knowledge Management, pages 18–25, McLean, Virginia, 2002.
John F. Roddick and Myra Spiliopoulou. A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering, 14(4):750–767, 2002.
Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Fifth International Conference on Extending Database Technology, Avignon, France, 1996.
Shiby Thomas and Sunita Sarawagi. Mining generalized association rules and sequential patterns using SQL queries. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 344–348, New York, 1998.
Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han. Mining long sequential patterns in a noisy environment. In 21st ACM SIGMOD International Conference on Management of Data, pages 406–417, Madison, Wisconsin, 2002.
Mohammed Javeed Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31–60, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, C., Kao, A., Choi, J., Tjoelker, R. (2008). Discovering Time-Constrained Patterns from Long Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-78297-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78296-4
Online ISBN: 978-3-540-78297-1
eBook Packages: EngineeringEngineering (R0)