Skip to main content
Log in

Mining frequent generators and closures in data streams with FGC-Stream

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs) or generators (FGIs), instead of plain FIs, has been explored. So far the tasks of mining FGIs and FCIs have only been addressed separately over data streams. Yet, both itemset families combine in the solutions of a range of practical problems while they also underlie the definition of handy association rule bases. To date, the joint mining task can only be approached by a combining two dedicated miners. As a remedy, we propose a holistic approach rooted in the support set-based equivalence classes underlying a transaction dataset: the ensuing \(\textit{FGC-Stream}\) miner exploits some mathematical results about those classes’ evolution to efficiently update both FCIs and FGIs. Thus, targeting a sliding window mode—where the window over a stream expands and shrinks—we enhance results from formal concept analysis to design an efficient expansion procedure. On window shrinking, we exploit some thoroughly new results about class evolution. Overall, \(\textit{FGC-Stream}\) achieves significant effort factoring through the collaborative maintenance of FCIs and FGIs. As a result, when confronted experimentally, it managed to largely outperform its unique FGI mining competitor while keeping up with two of the most efficient FCI miners. This outcome confirms that \(\textit{FGC-Stream}\) will dominate any combination of miners for the joint task. This article is an extended version of our paper [27] presented at the 21st International Conference on Data Mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The term jumper has been first introduced by Rouane et al. [34] in the context of incrementally computing the iceberg lattice of FCIs.

  2. https://github.com/Louis-Romain-Roux/FGC-Stream.

  3. http://fimi.ua.ac.be/data/.

  4. https://gitlab.com/adaptdata/e2.

  5. https://www.darpa.mil/program/transparent-computing.

References

  1. Aggarwal CC (2007) Data streams: models and algorithms, vol 31, Springer Science & Business Media

  2. Agrawal R, Imielinski T, Swami A (1993) Mining Association Rules between Sets of Items in Large Databases, In: Proceedings, ACM SIGMOD Conference on Management of Data, Washington, D.C., pp 207–216

  3. Alam K et al (2017) Enabling far-edge analytics: performance profiling of frequent pattern mining algorithms. IEEE Access 5:8236–8249

    Article  Google Scholar 

  4. Barbut M, Monjardet B (1970) Ordre et classification. Hachette

  5. Benabderrahmane S et al (2021) A rule mining-based advanced persistent threats detection system. In: 30th IJCAI

  6. Calders T et al (2004) A survey on condensed representations for frequent sets. Constraint Based Min Induc Databases 3848:64–80

    Article  MATH  Google Scholar 

  7. Calders T et al (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255

    Article  MathSciNet  Google Scholar 

  8. Chi Y et al (2004) Moment: Maintaining closed frequent itemsets over a stream sliding window, In: ICDM’04, IEEE, pp 59–66

  9. Cormode G, Muthukrishnan S (2005) What’s new: finding significant differences in network data streams. IEEE/ACM Trans Netw 13(6):1219–1232

    Article  Google Scholar 

  10. Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304

    Article  MathSciNet  MATH  Google Scholar 

  11. Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin/Heidelberg

    Book  MATH  Google Scholar 

  12. Gao C, Wang J (2009) Efficient itemset generator discovery over a stream sliding window. In: 18th ACM CIKM, pp 355–364

  13. Godin R, Missaoui R (1994) An incremental concept formation approach for learning from databases. Theor Comput Sci J 133:387–419

    Article  MathSciNet  MATH  Google Scholar 

  14. Godin R et al (1995) Incremental concept formation algorithms based on Galois (Concept) lattices. Comp Intell 11(2):246–267

    Article  Google Scholar 

  15. Hamadi S et al (2016) Compiling packet forwarding rules for switch pipelined architecture. In: The 35th IEEE INFOCOM, IEEE, pp 1–9

  16. Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams, In: 12th ACM SIGKDD, ACM, pp 592–597

  17. Johnson D et al (1988) On generating all maximal independent sets. Inf Process Lett 27(3):119–12300964

    Article  MathSciNet  MATH  Google Scholar 

  18. Karim R et al (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300

    Article  MathSciNet  MATH  Google Scholar 

  19. Kryszkiewicz M (2002) Concise representations of association rules, In: ESF Explor. WS on Pattern Detect. and Discov. pp 92–109

  20. Li H-F et al (2006) A new algorithm for maintaining closed frequent itemsets in data streams by incremental updates, In: ICDM Workshops 2006, IEEE, pp 672–676

  21. Li J (2006) Minimum description length principle: generators are preferable to closed patterns, In: AAAI, pp 409–414

  22. Li J et al (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: 13th ACM SIGKDD, pp 430–439

  23. Liu G et al (2008) A new concise representation of frequent itemsets using generators and a positive border. Kn Inf Syst 17(1):35–56

    Article  Google Scholar 

  24. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258

    Article  Google Scholar 

  25. Mao G et al (2007) Mining maximal frequent itemsets from data streams. J Inf Sci 33(3):251–262

    Article  Google Scholar 

  26. Martin T et al (2020) Ciclad: a fast and memory-efficient closed itemset miner for streams, In: 26th ACM SIGKDD, pp 1810–1818

  27. Martin T et al (2021) FGC-Stream: a novel joint miner for frequent generators and closed itemsets in data streams. In: 21th IEEE ICDM

  28. McKeown N et al (2008) Openflow: enabling innovation in campus networks. ACM SIGCOMM Comput Commun Rev 38(2):69–74

    Article  Google Scholar 

  29. Nehme K et al (2005) On computing the minimal generator family for concept lattices and icebergs. In: 3rd ICFCA, pp 192–207

  30. Nunes B et al (2014) A survey of software-defined networking: past, present, and future of programmable networks. IEEE Commun Surv Tutor 16(3):1617–1634

    Article  Google Scholar 

  31. Pasquier N et al (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  Google Scholar 

  32. Pfaltz J (2002) Incremental transformation of lattices: a key to effective knowledge discovery. In: 1st ICGT, pp 351–362

  33. Rashid M et al (2013) Mining associated sensor patterns for data stream of wireless sensor networks. In: 8th ACM WS on Perform Monitoring and Measurement of Heterog Wireless and Wired Nets’, pp 91–98

  34. Rouane M et al (2004) On-line maintenance of iceberg concept lattices. In: Contributions to the 12th ICCS. p 14

  35. Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Exp Syst Appl 57:214–231

    Article  Google Scholar 

  36. Stumme G et al (2002) Computing iceberg concept lattices with titanic. Data Knowl Eng 42(2):189–222

    Article  MATH  Google Scholar 

  37. Szathmary L et al (2007) Towards rare itemset mining. In: 19th IEEE ICTAI, 1:305–312

  38. Szathmary L et al (2009) Efficient vertical mining of frequent closures and generators. In: 8th IDA, pp 393–404

  39. Szathmary L et al (2014) A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Ann Math Artif Intell 70(1–2):81–105

    Article  MathSciNet  MATH  Google Scholar 

  40. Valiant L (1979) The complexity of enumeration and reliability problems. SIAM J Comput 8(3):410–42102325

    Article  MathSciNet  MATH  Google Scholar 

  41. Valtchev P et al (2002) Generating frequent itemsets incrementally: two novel approaches based on galois lattice theory. J Exp Theor Artif Intell 14(2–3):115–142

    Article  MATH  Google Scholar 

  42. Valtchev P et al (2003) A generic scheme for the design of efficient on-line algorithms for lattices, In: 11th ICCS, pp 282–295

  43. Valtchev P et al (2008) A framework for incremental generation of closed itemsets. Discrete Appl Math 156:924–949

    Article  MathSciNet  MATH  Google Scholar 

  44. Yen S et al (2009) An efficient algorithm for maintaining frequent closed itemsets over data stream, In: IEA/AIE, pp 767–776

  45. Yen S et al (2011) A fast algorithm for mining frequent closed itemsets over stream sliding window, In: Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, IEEE, pp 996–1002

  46. Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Y. Chi for the code of MomentFP and E. Hamon for his insightful remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomas Martin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martin, T., Valtchev, P. & Roux, LR. Mining frequent generators and closures in data streams with FGC-Stream. Knowl Inf Syst 65, 3295–3335 (2023). https://doi.org/10.1007/s10115-023-01852-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01852-3

Keywords

Navigation