Mining Surprising Periodic Patterns

Yang, Jiong; Wang, Wei; Yu, Philip S.

doi:10.1023/B:DAMI.0000031631.84034.af

Mining Surprising Periodic Patterns

Published: September 2004

Volume 9, pages 189–216, (2004)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Jiong Yang¹,
Wei Wang² &
Philip S. Yu³

308 Accesses
20 Citations
Explore all metrics

Abstract

In this paper, we focus on mining surprising periodic patterns in a sequence of events. In many applications, e.g., computational biology, an infrequent pattern is still considered very significant if its actual occurrence frequency exceeds the prior expectation by a large margin. The traditional metric, such as support, is not necessarily the ideal model to measure this kind of surprising patterns because it treats all patterns equally in the sense that every occurrence carries the same weight towards the assessment of the significance of a pattern regardless of the probability of occurrence. A more suitable measurement, information, is introduced to naturally value the degree of surprise of each occurrence of a pattern as a continuous and monotonically decreasing function of its probability of occurrence. This would allow patterns with vastly different occurrence probabilities to be handled seamlessly. As the accumulated degree of surprise of all repetitions of a pattern, the concept of information gain is proposed to measure the overall degree of surprise of the pattern within a data sequence. The bounded information gain property is identified to tackle the predicament caused by the violation of the downward closure property by the information gain measure and in turn provides an efficient solution to this problem. Furthermore, the user has a choice between specifying a minimum information gain threshold and choosing the number of surprising patterns wanted. Empirical tests demonstrate the efficiency and the usefulness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. on Very Large Data Bases, pp. 487–499.
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. Int. Conf. on Data Engineering (ICDE), Taipei, Taiwan, pp. 3–14.
Berger, G. and Tuzhilin, A. 1998. Discovering unexpected patterns in temporal data using temporal logic. Temporal Databases—Research and Practice, Lecture Notes on Computer Sciences, 1399:281–309.
Google Scholar
Berndt, D. and Clifford, J. 1996. Finding patterns in time series: A dynamic programming approach. Advances in Knowledge Discovery and Data Mining, 229–248.
Bettini, C., Wang, Jajodia, X.S.S., and Lin, Jia-Ling. 1998. Discovering frequent event patterns with multiple granularities in time sequences. IEEE Transaction on Knowledge and Data Engineering, 10(2):222–237.
Google Scholar
Blahut, R. 1987. Principles and Practice of Information Theory. Addison-Wesley Publishing Company.
Brin, S., Motwani, R., Ullman, J., and Tsur, S. 1997. Dynamic Itemset counting and implication rules for market basket data. In Proc. ACM SIGMOD Conf. on Management of Data, pp. 255–264.
Brin, S., Motwani, R., and Silverstein, C. 1997b. Beyond market baskets: Generalizing association rules to correlations. In Proc. ACM SIGMOD Conf. on Management of Data, pp. 265–276.
Califano, A., Stolovitzky, G., and Tu, Y. 1999. Analysis of gene expression microarrays: A combinatorial multi-variate approach, IBM T. J. Watson Research Report.
Chakrabarti, S., Sarawagi, S., and Dom, B. 1998. Mining surprising patterns using temporal description length. In Proc. Int. Conf. on Very Large Data Bases, pp. 606–617.
Cohen, E., Datar, M., Fuijiwara, S., Cionis, A., Indyk, P., Motwani, R., Ullman, J., and Yang, C. 2000. Finding interesting associations without support pruning. In Proc. 16th Int. Conf. on Data Engineering (ICDE), pp. 489–499.
Das, G., Gunopulos, D., and Mannila, H. 1997. Finding similar time series. In Proc. European Conf. on Principles of Data Mining and Knowledge Discovery, pp. 88–100.
Das, G., Lin, K.-I., and Mannila, H. 1998. Gopal Renganathan, and Padhraic Smyth. Rule discovery from time series. In Proc. Int. Conf. on Knowledge Discovery and Datamining, pp. 16–22.
Feldman, R., Aumann, Y., Amir, A., and Mannila, H. 1997. Efficient algorithms for discovering frequent sets in incremental databases. In Proc. ACMSIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pp. 59–66.
Fujiwara, S., Ullman, J., and Motwani, R. 2000. Dynamic miss-counting algorithms: Finding implication and similarity rules with confidence pruning. In Proc. 16th Int. Conf. on Data Engineering (ICDE), pp. 501–511.
Garofalakis, M., Rastogi, R., and Shim, K. 1999. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pp. 223–234.
Ge, X. and Smyth, P. 2000. Deformable Markov model templates for time-series pattern matching. In Proc. ACM SIGKDD, pp. 81–90.
Gunopulos, G., Mannila, H., and Saluja, S. 1997. Discovering all most specific sentences by randomized algorithms. In Proc. 6th Int. Conf. on Database Theory, pp. 215–229.
Guralnik, V., Wijesekera, D., and Srivastava, J. 1998. Pattern directed mining of sequence data. In Proc. ACM SIGKDD, pp. 51–57.
Guralnik, V. and Srivastava, J. 1999. Event detection from time series data. In Proc. ACM SIGKDD, pp. 33–42.
Han, J., Gong, W., and Yin, Y. 1998. Mining segment-wise periodic patterns in time-related databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining, pp. 214–218.
Han, J., Dong, G., and Yin, Y. 1999. Efficient mining partial periodic patterns in time series database. In Proc. Int. Conf. on Data Engineering, pp. 106–115.
Han, J., Pei, J., and Yin, Y. 2000a. Mining frequent patterns without candidate generation. In Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), pp. 1–12.
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M. 2000b. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. Int. Conf. on Knowledge Discovery and Data Mining.
Huhtala, Y., Krkkinen, J., Porkka, P., and Toivonen, H. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100–111.
Google Scholar
Keogh, E.J. and Smyth, P. 1997. A probabilistic approach to fast pattern matching in time series databases. In Proc. Int. Conf. on Knowledge Discovery and Datamining, pp. 24–30.
Klemetinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A. 1994. Finding interesting rules from large sets of discovered association rules. In Proc. CIKM.
Liu, B., Hsu, W., and Ma, Y. 1999. Mining association Rules with multiple minimum supports. In Proc. ACM SIGKDD, pp. 337–341.
Liu, B., Hsu, W., and Ma, Y. 1999. Pruning and summarizing discovered associations. In Proc. ACM SIGKDD, pp. 125–134.
Liu, B., Hu, M., and Hsu, W. 2000. Multi-level organization and summarization of the discovered rules. In Proc. ACM SIGKDD, pp. 208–217.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289.
Google Scholar
Mannila, H., Pavlov, D., and Smyth, P. 1999. Prediction with local patterns using cross-entropy. In Proc. ACM SIGKDD, pp. 357–361.
Mannila, H. and Meek, C. 2000. Global partial orders from sequential data. In Proc. ACMSIGKDD, pp. 161–168.
Oates, T. 1999. Identifying distinctive subsequences in multivariate time series by clustering. In Proc. ACM SIGKDD, pp. 322–326.
Oates, T., Schmill, M.D., and Cohen, P.R. 1999. Efficient mining of statistical dependencies. In Proc. 16th Int. Joint Conf. on Artificial Intelligence, pp. 794–799.
Ozden, B., Ramaswamy, S., and Silberschatz, A. 1998. Cyclic association rules. In Proc. 14th Int. Conf. on Data Engineering, pp. 412–421.
Padmanabhan, B. and Tuzhilin, A. 1996. Pattern discovery in temporal databases: A temporal logic approach. In Proc. ACM KDD, pp. 351–354.
Padmanabhan, B. and Tuzhilin, A. 1998. A belief-driven method for discovering unexpected patterns. In Proc. ACM KDD, pp. 94–100.
Padmanabhan, B. and Tuzhilin, A. 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proc. ACM KDD, pp. 54–63.
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discovering frequent closed itemsets for association rules. In Proc. Int. Conf. on Database Theory, pp. 398–416.
Pei, J., Han, J. and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. 2001. PrefixSpan: Mining sequential patterns by prefix-projected growth. In Proc. IEEE Conf. Data Engineering, pp. 215–224.
Piateski-Shapiro, G. and Matheus, C. 1994. The interestingness of deviations. In Proc. AAAIWorkshop Knowledge Discovery in Databases, pp. 25–36.
Qu, Y., Wang, C., and Wang, X.S. 1998. Supporting fast search in time series for movement patterns in multiple scales. In Proc. 7th ACM Int. Conf. on Information and Knowledge Management, pp. 251–258.
Rafiei, D. 1999. On similarity-based queries for time series data. In Proc. 15th Int. Conf. on Data Engineering, pp. 410–417.
Ramaswamy, S. Mahajan, S., and Silberschatz, A. 1998 On the discovery of interesting patterns in association rules. In Proc. 24th Intl. Conf. on Very Large Data Bases (VLDB), pp. 368–379.
Sahar, S. 1999. Interestingness via what is not interesting. In Proc. 5th ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), pp. 332–336.
Shah, D., Lakshmanan, L., Ramamritham, K., and Sudarshan, S. 1999. Interestingness and pruning of mined patterns. In Proc. ACM SIGMOD Workshop on Research Issues in Datamining and Knowledge Discovery.
Silberschatz, A. and Tuzhilin, A. 1996. What makes patterns interesting in knowledge discover systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), 8(6): 970–974.
Google Scholar
Spiliopoulou, Myra. 1999. Managing interesting rules in sequence mining. In Proc. European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 554–560.
Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. on Extending Database Technology (EDBT), pp. 3–17.
Thomas, S. and Sarawagi, S. Mining generalized association rules and sequential patterns using SQL queries. In Prof. of 4th Intl. Conf. on Knowledge Discovery and Data Mining (KDD98), pp. 344–348.
Wang, J., Chirn, G., Marr, T., Shapiro, B., Shasha, D., and Hang, K. 1994. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. ACMSIGMOD Conf. on Management of Data, pp. 115–125.
Wang, K., He, Y., and Han, J. 2000. Mining frequent itemsets using support constraints. In Proc. Int. Conf. on on Very Large Data Bases.
Yang, J., Wang, W., and Yu, P. 2000. Mining asynchronous periodic patterns in time series data. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), pp. 275–279.
Yi, B., Jagadish, H.V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proc. Int. Conf. on Data Engineering, pp. 201–208.
Zaki, M. 2000. Sequence mining in categorical domains: Incorporating constraints. In Proc. 9th Int. Conf. on Information and Knowledge Management, pp. 422–429.
Zaki, M.J. 2000. Generating non-redundant association rules. In Proc. ACM SIGKDD, pp. 34–43.
Zaki, M. 2001. SPADE: An efficiant algorithm for mining frequent sequences. Machine Learning Journal, special issue on Unsupervised Learning, 42(1/2):31–60.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 201 N. Goodwin Ave., Urbana, IL, 61801, USA
Jiong Yang
Computer Science Department, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Wei Wang
IBM T. J. Watson Research Center, 19 Skyline Dr., Hawthorne, NY, 10532, USA
Philip S. Yu

Authors

Jiong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Wang, W. & Yu, P.S. Mining Surprising Periodic Patterns. Data Mining and Knowledge Discovery 9, 189–216 (2004). https://doi.org/10.1023/B:DAMI.0000031631.84034.af

Download citation

Issue Date: September 2004
DOI: https://doi.org/10.1023/B:DAMI.0000031631.84034.af

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Surprising Periodic Patterns

Abstract

Access this article

Similar content being viewed by others

A Fast and Simple Method for Mining Subsequences with Surprising Event Counts

Finding Periodic Patterns in Multiple Sequences

Exact Discovery of Length-Range Motifs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Mining Surprising Periodic Patterns

Abstract

Access this article

Similar content being viewed by others

A Fast and Simple Method for Mining Subsequences with Surprising Event Counts

Finding Periodic Patterns in Multiple Sequences

Exact Discovery of Length-Range Motifs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation