Skip to main content
Log in

A general streaming algorithm for pattern discovery

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Discovering frequent patterns over event sequences is an important data mining problem. Existing methods typically require multiple passes over the data, rendering them unsuitable for streaming contexts. We present the first streaming algorithm for mining frequent patterns over a window of recent events in the stream. We derive approximation guarantees for our algorithm in terms of: (i) the separation of frequent patterns from the infrequent ones, and (ii) the rate of change of stream characteristics. Our parameterization of the problem provides a new sweet spot in the tradeoff between making distributional assumptions over the stream and algorithmic efficiencies of mining. We illustrate how this yields significant benefits when mining practical streams from neuroscience and telecommunications logs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Level-wise algorithms start with patterns of size 1 and with each increasing level estimate frequent patterns of the next size.

  2. Other models such as the landmark and time-fading models have also been studied [11], but we do not consider them here.

  3. Border sets were employed by [4] for efficient mining of dynamic databases. Multiple passes over older data are needed for any new frequent itemsets, which are not feasible in a streaming context.

  4. \(\mathrm{f_{score}}=2\cdot \frac{\mathrm{Precision} \cdot \mathrm{recall}}{\mathrm{Precision} + \mathrm{recall}}\).

References

  1. Achar A et al (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108

    Article  MathSciNet  MATH  Google Scholar 

  2. Agrawal J et al (2008) Efficient pattern matching over event streams. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08, pp 147–160

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

  4. Aumann Y et al (1999) Borders: an efficient algorithm for association generation in dynamic databases. J Int Inf Syst (JIIS) 1:61–73

    Article  MathSciNet  Google Scholar 

  5. Babcock B et al (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 1–16

  6. Calders T et al (2007) Mining frequent itemsets in a stream. In: Proceedings of the 7th IEEE international conference on data mining (ICDM), pp 83–92

  7. Chandramouli B et al (2010) High-performance dynamic pattern matching over disordered streams. Proc VLDB Endow 3(1–2):220–231

    Google Scholar 

  8. Chandramouli B et al (2012) Temporal analytics on big data for web advertising. In: Proceedings of the international conference of data engineering (ICDE)

  9. Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 487–492

  10. Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4):753–762

    Google Scholar 

  11. Cheng J et al (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst. 16(1):1–27

    Article  Google Scholar 

  12. Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE international conference on data mining (ICDM), pp 210–217

  13. Karp RM et al (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst 28:51–55

    Article  Google Scholar 

  14. Lam HT et al (2011) Online discovery of top-k similar motifs in time series data. In: SIAM international conference of data mining, pp 1004–1015

  15. Laxman S (2006) Discovering frequent episodes: fast algorithms. Connections with HMMs and generalizations. PhD thesis, IISc, Bangalore, India

  16. Laxman S, Sastry PS (2006) A survey of temporal data mining. \(S\bar{A}DH\bar{A}N\bar{A}\) Acad Proc Eng Sci 31: 173–198

  17. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases (VLDB), pp 346–357

  18. Mannila H et al (1997) Discovery of frequent episodes in event sequences. Data Minand Knowl Discov 1(3):259–289

    Article  Google Scholar 

  19. Mendes L, Ding B, Han J (2008) Stream sequential pattern mining with precise error bounds. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), pp 941–946

  20. Mueen A, Keogh E (2010) Online discovery and maintenance of time series motif. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  21. Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends Theoret Comput Sci 1(2):117–236

    Google Scholar 

  22. Patnaik D, Laxman S, Ramakrishnan N (2009) Discovering excitatory networks from discrete event streams with applications to neuronal spike train analysis. In: Proceedings of the 9th IEEE international conference on data mining (ICDM)

  23. Patnaik D, Marwah M, Sharma R, Ramakrishnan N (2009) Sustainable operation and management of data center chillers using temporal data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1305–1314

  24. Patnaik D et al (2012) Streaming algorithms for pattern discovery over dynamically changing event sequences, CoRR abs/1205.4477

  25. Pei J et al (2001) Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th interantional conference on data engineering (ICDE), pp 215–224

  26. Ramakrishnan N, Patnaik D, Sreedharan V (2009) Temporal process discovery in many guises. IEEE Comput 42(8):97–101

    Article  Google Scholar 

  27. Wagenaar DA et al (2006) An extremely rich repertoire of bursting patterns during the development of cortical cultures. BMC Neurosci 7(1):11

    Google Scholar 

  28. Wang J et al (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–664

    Article  Google Scholar 

  29. Wong RC-W, Fu AW-C (2006) Mining top-k frequent itemsets from data streams. Data Min Knowl Discov 13:193–217

    Article  MathSciNet  Google Scholar 

  30. Yan X, Han J (2003) CloseGraph: mining closed frequent subgraph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03)

Download references

Acknowledgments

This work is supported in part by US National Science Foundation grants IIS-0905313 and CCF-0937133 and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC000337. The US Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debprakash Patnaik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patnaik, D., Laxman, S., Chandramouli, B. et al. A general streaming algorithm for pattern discovery. Knowl Inf Syst 37, 585–610 (2013). https://doi.org/10.1007/s10115-013-0669-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0669-z

Keywords

Navigation