Discovering Time-Constrained Patterns from Long Sequences

Wang, Changzhou; Kao, Anne; Choi, Jai; Tjoelker, Rod

doi:10.1007/978-3-540-78297-1_5

Changzhou Wang⁵,
Anne Kao⁵,
Jai Choi⁵ &
…
Rod Tjoelker⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 116))

913 Accesses

Summary

In recent years, many techniques have been proposed to discover sequential patterns with temporal constraints from long sequences of categorical events. Central to these techniques is the concept of pattern occurrence. A rich set of measures based on pattern occurrence has been developed for sequential pattern discovery, filtering and ranking. Often, recurrences of patterns within the same sequence are ignored. However, the total number of pattern occurrences in each individual sequence can provide valuable insights, especially for applications with long sequences each containing many events.

In this chapter, we propose to use the maximum cardinality of all disjoint occurrence sets as the number of pattern occurrences in an individual sequence. In contrast to previous proposals, our de.nition (1) depends only on the sequence and the pattern, without requiring additional parameters such as a sliding window size; (2) ensures that patterns occur more often if their temporal constraints are relaxed; and (3) enables us to easily estimate the expected number of pattern occurrences, so that patterns whose counted occurrences deviate signi.cantly from their expectations can be regarded as surprising ones.

In addition, we (1) describe a greedy algorithm which e.ciently identi.es a single disjoint occurrence set that has the maximum cardinality; (2) develop a formula to calculate the expected value under a uniform distribution assumption; and (3) design an approximation process to e.ciently estimate the expected value. Our experiments show that the greedy algorithm is e.cient and the approximation is accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In The Eleventh International Conference on Data Engineering, pages 3–14, Taipei, Taiwan, 1995.
Google Scholar
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Google Scholar
Jiawei Han, Laks V. S. Lakshmanan, and Jian Pei. Scalable frequent-pattern mining methods: An overview. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Tutorial Notes, San Francisco, California, 2001.
Google Scholar
Jiawei Han, Jian Pei, and Y. Yin. Mining frequent patterns without candidate generation. In 2000 ACM-SIGMOD International Conference on Management of Data, Dallas, Texas, 2000.
Google Scholar
Mahesh Joshi, George Karypis, and Vipin Kumar. A universal formulation of sequential patterns. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Temporal Data Mining, San Francisco, California, 2001.
Google Scholar
Kai-Sang Leung, Raymond T. Ng, and Heikki Mannila. OSSM: A segmentation approach to optimize frequency counting. In 18th International Conference on Data Engineering, San Jose, California, 2002.
Google Scholar
Heikki Mannila and Hannu Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 146–151, Portland, Oregon, 1996.
Google Scholar
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 210–215, Montreal, Canada, 1995.
Google Scholar
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland, 1997. Series of Publications C.
Google Scholar
Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. PrefixSpan: Mining sequential patterns by prefix-projected growth. In 17th International Conference on Data Engineering, pages 215–224, Heidelberg, Germany, 2001.
Google Scholar
Jian Pei, Jiawei Han, and Wei Wang. Mining sequential patterns with constraints in large databases. In Eleventh International Conference on Information and Knowledge Management, pages 18–25, McLean, Virginia, 2002.
Google Scholar
John F. Roddick and Myra Spiliopoulou. A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering, 14(4):750–767, 2002.
Article Google Scholar
Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Fifth International Conference on Extending Database Technology, Avignon, France, 1996.
Google Scholar
Shiby Thomas and Sunita Sarawagi. Mining generalized association rules and sequential patterns using SQL queries. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 344–348, New York, 1998.
Google Scholar
Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han. Mining long sequential patterns in a noisy environment. In 21st ACM SIGMOD International Conference on Management of Data, pages 406–417, Madison, Wisconsin, 2002.
Google Scholar
Mohammed Javeed Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31–60, 2001.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Boeing Phantom Works, Seattle, WA, USA
Changzhou Wang, Anne Kao, Jai Choi & Rod Tjoelker

Authors

Changzhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Anne Kao
View author publications
You can also search for this author in PubMed Google Scholar
Jai Choi
View author publications
You can also search for this author in PubMed Google Scholar
Rod Tjoelker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
Ying Liu
School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798
Aixin Sun & Ee-Peng Lim &
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117576
Han Tong Loh & Wen Feng Lu &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, C., Kao, A., Choi, J., Tjoelker, R. (2008). Discovering Time-Constrained Patterns from Long Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-78297-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78296-4
Online ISBN: 978-3-540-78297-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics