Abstract
Pattern Mining over Data Stream (PMDS) is part of the most significant task in data mining. A major challenge is to define a representational framework that unifies PMDS algorithms dealing with different pattern types (frequent itemset, high-utility itemset, uncertain frequent itemset), using different methods (test-and-generate, pattern-growth, hybrid) and different window models (landmark, sliding, decay, tilted) in a uniform fashion. This will help standardize the process and create a better understanding of the algorithm design, provide a base for unification and research opportunities. It also facilitates the variability management and allows the derivation of tools for wide experimentation. In this publication, we propose a reference ontology to formalize the domain knowledge around PMDS. The design process of the ontology followed leading practices in ontology engineering. It is aligned to the most popular data mining and machine learning ontologies and thus, represents a major contribution toward PMDS domain ontologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Arp, R., Smith, B., Spear, A.D.: Building Ontologies with Basic Formal Ontology. MIT Press, Cambridge (2015)
Bandrowski, A., et al.: The ontology for biomedical investigations. PLoS ONE 11(4), e0154556 (2016)
Bayardo, R.J., Jr.: Efficiently mining long patterns from databases. ACM SIGMOD Rec. 27(2), 85–93 (1998)
Bécan, G., Acher, M., Baudry, B., Nasr, S.B.: Breathing ontological knowledge into feature model synthesis: an empirical study. Empir. Softw. Eng. 21(4), 1794–1841 (2016)
Benali, K., Rahal, S.A.: Ontodta: ontology-guided decision tree assistance. J. Inf. Knowl. Manag. 16(03), 1750031 (2017)
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, pp. 119–128 (2009)
Ceusters, W.: An information artifact ontology perspective on data collections and associated representational artifacts. In: Proceedings of the Medical Informatics in Europe Conference (MIE 2012), pp. 68–72 (2012)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 487–492 (2003)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 59–66 (2004)
Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Nanjing, China, pp. 47–58 (2007)
Czarnecki, K., Hwan, C., Kim, P., Kalleberg, K.: Feature models are views on ontologies. In: 10th International Software Product Line Conference (SPLC 2006), Baltimore, MD, USA, pp. 41–51 (2006)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining frequent patterns in data streams at multiple time granularities. Next Gener. Data Mining 212, 191–212 (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)
Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A.: Ontology-based meta-mining of knowledge discovery workflows. In: Meta-Learning in Computational Intelligence, pp. 273–315 (2011)
Hong, T.P., Lee, C.H., Wang, S.L.: Mining high average-utility itemsets. In: IEEE International Conference on Systems Man and Cybernetics, San Antonio, TX, USA, pp. 2526–2530 (2009)
Hong, T.P., Lin, C.W., Wu, Y.L.: Incrementally fast updated frequent pattern trees. Expert Syst. Appl. 34(4), 2424–2435 (2008)
Jovanovska, L., Panov, P.: Semantic representation of machine learning and data mining algorithms. In: Proceedings of the 44th International Convention on Information, Communication and Electronic Technology (MIPRO), pp. 205–210 (2021)
Keet, C.M., et al.: The data mining optimization ontology. J. Web Semant. 32, 43–53 (2015)
Lee, V.E., Jin, R., Agrawal, G.: Frequent pattern mining in data streams. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 199–224. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_9
Leung, C.K.S., Khan, Q.I.: Dstree: a tree structure for the mining of frequent sets from data streams. In: Proceedings of the 6th IEEE International Conference on Data Mining, Hong Kong, China, pp. 928–932 (2006)
Li, H.F., Lee, S.Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)
Li, H.F., Lee, S.Y., Shan, M.K.: An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of 1st International Workshop on Knowledge Discovery in Data Streams (2004)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB), Hong Kong, China, pp. 346–357 (2002)
Panov, P., Soldatova, L., Džeroski, S.: Ontology of core data mining entities. Data Mining Knowl. Discov. 28(4), 1222–1265 (2014). https://doi.org/10.1007/s10618-014-0363-0
Panov, P., Soldatova, L.N., Džeroski, S.: Generic ontology of datatypes. Inf. Sci. 329, 900–920 (2016)
Pei, J., Han, J., Mao, R., et al.: Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, vol. 4, no. 2, pp. 21–30 (2000)
Publio, G.C., et al.: ML-schema: exposing the semantics of machine learning with schemas and ontologies. arXiv preprint arXiv:1807.05351 (2018)
Smith, B., et al.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)
Smith, B., et al.: Relations in biomedical ontologies. Genome Biol. 6(5), 1–15 (2005)
Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Experiment databases. Mach. Learn. 87(2), 127–158 (2012)
Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, pp. 482–486 (2004)
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, pp. 204–215 (2004)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Samb, D., Slimani, Y., Ndiaye, S. (2024). Toward an Ontology of Pattern Mining over Data Streams. In: Bennour, A., Bouridane, A., Chaari, L. (eds) Intelligent Systems and Pattern Recognition. ISPR 2023. Communications in Computer and Information Science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-46335-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46334-1
Online ISBN: 978-3-031-46335-8
eBook Packages: Computer ScienceComputer Science (R0)