Abstract
Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs) or generators (FGIs), instead of plain FIs, has been explored. So far the tasks of mining FGIs and FCIs have only been addressed separately over data streams. Yet, both itemset families combine in the solutions of a range of practical problems while they also underlie the definition of handy association rule bases. To date, the joint mining task can only be approached by a combining two dedicated miners. As a remedy, we propose a holistic approach rooted in the support set-based equivalence classes underlying a transaction dataset: the ensuing \(\textit{FGC-Stream}\) miner exploits some mathematical results about those classes’ evolution to efficiently update both FCIs and FGIs. Thus, targeting a sliding window mode—where the window over a stream expands and shrinks—we enhance results from formal concept analysis to design an efficient expansion procedure. On window shrinking, we exploit some thoroughly new results about class evolution. Overall, \(\textit{FGC-Stream}\) achieves significant effort factoring through the collaborative maintenance of FCIs and FGIs. As a result, when confronted experimentally, it managed to largely outperform its unique FGI mining competitor while keeping up with two of the most efficient FCI miners. This outcome confirms that \(\textit{FGC-Stream}\) will dominate any combination of miners for the joint task. This article is an extended version of our paper [27] presented at the 21st International Conference on Data Mining.
Similar content being viewed by others
Notes
The term jumper has been first introduced by Rouane et al. [34] in the context of incrementally computing the iceberg lattice of FCIs.
References
Aggarwal CC (2007) Data streams: models and algorithms, vol 31, Springer Science & Business Media
Agrawal R, Imielinski T, Swami A (1993) Mining Association Rules between Sets of Items in Large Databases, In: Proceedings, ACM SIGMOD Conference on Management of Data, Washington, D.C., pp 207–216
Alam K et al (2017) Enabling far-edge analytics: performance profiling of frequent pattern mining algorithms. IEEE Access 5:8236–8249
Barbut M, Monjardet B (1970) Ordre et classification. Hachette
Benabderrahmane S et al (2021) A rule mining-based advanced persistent threats detection system. In: 30th IJCAI
Calders T et al (2004) A survey on condensed representations for frequent sets. Constraint Based Min Induc Databases 3848:64–80
Calders T et al (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255
Chi Y et al (2004) Moment: Maintaining closed frequent itemsets over a stream sliding window, In: ICDM’04, IEEE, pp 59–66
Cormode G, Muthukrishnan S (2005) What’s new: finding significant differences in network data streams. IEEE/ACM Trans Netw 13(6):1219–1232
Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin/Heidelberg
Gao C, Wang J (2009) Efficient itemset generator discovery over a stream sliding window. In: 18th ACM CIKM, pp 355–364
Godin R, Missaoui R (1994) An incremental concept formation approach for learning from databases. Theor Comput Sci J 133:387–419
Godin R et al (1995) Incremental concept formation algorithms based on Galois (Concept) lattices. Comp Intell 11(2):246–267
Hamadi S et al (2016) Compiling packet forwarding rules for switch pipelined architecture. In: The 35th IEEE INFOCOM, IEEE, pp 1–9
Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams, In: 12th ACM SIGKDD, ACM, pp 592–597
Johnson D et al (1988) On generating all maximal independent sets. Inf Process Lett 27(3):119–12300964
Karim R et al (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300
Kryszkiewicz M (2002) Concise representations of association rules, In: ESF Explor. WS on Pattern Detect. and Discov. pp 92–109
Li H-F et al (2006) A new algorithm for maintaining closed frequent itemsets in data streams by incremental updates, In: ICDM Workshops 2006, IEEE, pp 672–676
Li J (2006) Minimum description length principle: generators are preferable to closed patterns, In: AAAI, pp 409–414
Li J et al (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: 13th ACM SIGKDD, pp 430–439
Liu G et al (2008) A new concise representation of frequent itemsets using generators and a positive border. Kn Inf Syst 17(1):35–56
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258
Mao G et al (2007) Mining maximal frequent itemsets from data streams. J Inf Sci 33(3):251–262
Martin T et al (2020) Ciclad: a fast and memory-efficient closed itemset miner for streams, In: 26th ACM SIGKDD, pp 1810–1818
Martin T et al (2021) FGC-Stream: a novel joint miner for frequent generators and closed itemsets in data streams. In: 21th IEEE ICDM
McKeown N et al (2008) Openflow: enabling innovation in campus networks. ACM SIGCOMM Comput Commun Rev 38(2):69–74
Nehme K et al (2005) On computing the minimal generator family for concept lattices and icebergs. In: 3rd ICFCA, pp 192–207
Nunes B et al (2014) A survey of software-defined networking: past, present, and future of programmable networks. IEEE Commun Surv Tutor 16(3):1617–1634
Pasquier N et al (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Pfaltz J (2002) Incremental transformation of lattices: a key to effective knowledge discovery. In: 1st ICGT, pp 351–362
Rashid M et al (2013) Mining associated sensor patterns for data stream of wireless sensor networks. In: 8th ACM WS on Perform Monitoring and Measurement of Heterog Wireless and Wired Nets’, pp 91–98
Rouane M et al (2004) On-line maintenance of iceberg concept lattices. In: Contributions to the 12th ICCS. p 14
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Exp Syst Appl 57:214–231
Stumme G et al (2002) Computing iceberg concept lattices with titanic. Data Knowl Eng 42(2):189–222
Szathmary L et al (2007) Towards rare itemset mining. In: 19th IEEE ICTAI, 1:305–312
Szathmary L et al (2009) Efficient vertical mining of frequent closures and generators. In: 8th IDA, pp 393–404
Szathmary L et al (2014) A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Ann Math Artif Intell 70(1–2):81–105
Valiant L (1979) The complexity of enumeration and reliability problems. SIAM J Comput 8(3):410–42102325
Valtchev P et al (2002) Generating frequent itemsets incrementally: two novel approaches based on galois lattice theory. J Exp Theor Artif Intell 14(2–3):115–142
Valtchev P et al (2003) A generic scheme for the design of efficient on-line algorithms for lattices, In: 11th ICCS, pp 282–295
Valtchev P et al (2008) A framework for incremental generation of closed itemsets. Discrete Appl Math 156:924–949
Yen S et al (2009) An efficient algorithm for maintaining frequent closed itemsets over data stream, In: IEA/AIE, pp 767–776
Yen S et al (2011) A fast algorithm for mining frequent closed itemsets over stream sliding window, In: Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, IEEE, pp 996–1002
Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Acknowledgements
The authors would like to thank Y. Chi for the code of MomentFP and E. Hamon for his insightful remarks.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Martin, T., Valtchev, P. & Roux, LR. Mining frequent generators and closures in data streams with FGC-Stream. Knowl Inf Syst 65, 3295–3335 (2023). https://doi.org/10.1007/s10115-023-01852-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01852-3