Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 116))

  • 913 Accesses

Summary

In recent years, many techniques have been proposed to discover sequential patterns with temporal constraints from long sequences of categorical events. Central to these techniques is the concept of pattern occurrence. A rich set of measures based on pattern occurrence has been developed for sequential pattern discovery, filtering and ranking. Often, recurrences of patterns within the same sequence are ignored. However, the total number of pattern occurrences in each individual sequence can provide valuable insights, especially for applications with long sequences each containing many events.

In this chapter, we propose to use the maximum cardinality of all disjoint occurrence sets as the number of pattern occurrences in an individual sequence. In contrast to previous proposals, our de.nition (1) depends only on the sequence and the pattern, without requiring additional parameters such as a sliding window size; (2) ensures that patterns occur more often if their temporal constraints are relaxed; and (3) enables us to easily estimate the expected number of pattern occurrences, so that patterns whose counted occurrences deviate signi.cantly from their expectations can be regarded as surprising ones.

In addition, we (1) describe a greedy algorithm which e.ciently identi.es a single disjoint occurrence set that has the maximum cardinality; (2) develop a formula to calculate the expected value under a uniform distribution assumption; and (3) design an approximation process to e.ciently estimate the expected value. Our experiments show that the greedy algorithm is e.cient and the approximation is accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In The Eleventh International Conference on Data Engineering, pages 3–14, Taipei, Taiwan, 1995.

    Google Scholar 

  2. Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.

    Google Scholar 

  3. Jiawei Han, Laks V. S. Lakshmanan, and Jian Pei. Scalable frequent-pattern mining methods: An overview. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Tutorial Notes, San Francisco, California, 2001.

    Google Scholar 

  4. Jiawei Han, Jian Pei, and Y. Yin. Mining frequent patterns without candidate generation. In 2000 ACM-SIGMOD International Conference on Management of Data, Dallas, Texas, 2000.

    Google Scholar 

  5. Mahesh Joshi, George Karypis, and Vipin Kumar. A universal formulation of sequential patterns. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Temporal Data Mining, San Francisco, California, 2001.

    Google Scholar 

  6. Kai-Sang Leung, Raymond T. Ng, and Heikki Mannila. OSSM: A segmentation approach to optimize frequency counting. In 18th International Conference on Data Engineering, San Jose, California, 2002.

    Google Scholar 

  7. Heikki Mannila and Hannu Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 146–151, Portland, Oregon, 1996.

    Google Scholar 

  8. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 210–215, Montreal, Canada, 1995.

    Google Scholar 

  9. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland, 1997. Series of Publications C.

    Google Scholar 

  10. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. PrefixSpan: Mining sequential patterns by prefix-projected growth. In 17th International Conference on Data Engineering, pages 215–224, Heidelberg, Germany, 2001.

    Google Scholar 

  11. Jian Pei, Jiawei Han, and Wei Wang. Mining sequential patterns with constraints in large databases. In Eleventh International Conference on Information and Knowledge Management, pages 18–25, McLean, Virginia, 2002.

    Google Scholar 

  12. John F. Roddick and Myra Spiliopoulou. A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering, 14(4):750–767, 2002.

    Article  Google Scholar 

  13. Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Fifth International Conference on Extending Database Technology, Avignon, France, 1996.

    Google Scholar 

  14. Shiby Thomas and Sunita Sarawagi. Mining generalized association rules and sequential patterns using SQL queries. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 344–348, New York, 1998.

    Google Scholar 

  15. Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han. Mining long sequential patterns in a noisy environment. In 21st ACM SIGMOD International Conference on Management of Data, pages 406–417, Madison, Wisconsin, 2002.

    Google Scholar 

  16. Mohammed Javeed Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31–60, 2001.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, C., Kao, A., Choi, J., Tjoelker, R. (2008). Discovering Time-Constrained Patterns from Long Sequences. In: Liu, Y., Sun, A., Loh, H.T., Lu, W.F., Lim, EP. (eds) Advances of Computational Intelligence in Industrial Systems. Studies in Computational Intelligence, vol 116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78297-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78297-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78296-4

  • Online ISBN: 978-3-540-78297-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics