Abstract
In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.
Similar content being viewed by others
References
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 247–255
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD, international conference on management of data. ACM Presss, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th VLDB conference. Morgan Kaufmann Publishers Inc., San Francisco, pp 487–499
Agrawal R, Srikant R (1995) Mining generalized association rules. In: VLDB 1995. Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419
Apiletti D, Baralis E, Cerquitelli T, D’Elia V (2009) Characterizing network traffic by means of the netmine framework. Comput Netw 53:774–789
Baralis E, Cagliero L, Cerquitelli T, D’Elia V, Garza P (2010) Support driven opportunistic aggregation for generalized itemset extraction. In: 5th IEEE international conference on intelligent systems, IS 2010, July 2010, pp 102–107
Baralis E, Cagliero L, Cerquitelli T, Garza P, Marchetti M (2010) CAS-MINE: providing personalized services in context-aware applications by means of generalized rules. Knowl Inf Syst 28:283–310
Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194(2012):68–84
Baralis E, Cagliero L, Cerquitelli T, D’Elia V, Garza P (2014) Expressive generalized itemsets. Inf Sci 278:327–343
Baralis E, Cagliero L, Cerquitelli T, Chiusano S, Garza P, Grimaudo L, Pulvirenti F (2014) Misleading generalized itemset mining in the in the cloud. In: IEEE international symposium on parallel and distributed processing with applications (ISPA), pp 26–28
Barsky M, Kim S, Weninger T, Han J (2011) Mining flipping correlations from large datasets with taxonomies. J Proc VLDB Endow 5(4):370–381
Cagliero L (2013) Discovering temporal change patterns in the presence of taxonomies. IEEE Trans Knowl Data Eng 25(3):541–555
Cagliero L, Garza P (2013) Itemsets generalization with cardinality-based constraints. Inf Sci 224:167–174
Cagliero L, Cerquitelli T, Garza P, Grimaudo L (2014) Misleading generalized itemset discovery. Exp Syst Appl 41(2014):1400–1410
Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Proceedings IEEE international conference on data mining, 2001. ICDM 2001. IEEE
Hashem T, Ahmed CF, Samiullah Md, Akther S, Jeong B, Jeon S (2014) An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices. Exp Syst Appl 41 (6):2914–2938
Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci 178 (1):69–87
Jayalakshmi N, Vidhya V, Krishnamurthy M, Kannan A (2012) Frequent itemset generation using double hashing technique. Proc Eng 38:1467–1478
Jayanthi B, Duraiswamy K (2012) A novel algorithm for cross level frequent pattern mining in multidatasets. Int J Comput Appl 37(6):30–35
Király A, Laiho A, Abonyi J, Gyenesei A (2014) Novel techniques and an efficient algorithm for closed pattern mining. Exp Syst Appl 41(11):5105–5114
Kunkle D, Zhang D, Cooperman G (2008) Mining frequent generalized itemsets and generalized association rules without redundancy. J Comput Sci Technol 23(1):77–102
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 67–73
Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: The 7th international symposium on computers and communications, pp 1040–1045
Sriphaew K, Theeramunkong T (2003) Mining generalized closed frequent itemsets of generalized associations rules. In: Knowledge-based intelligent information and engineering systems, lecture notes in computer science, vol 2773, pp 476–484
Sriphaew K, Theeramunkong T (2004) Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Trans Inf Syst 87:761–770
Subramanian DK, Ananthanarayana VS, Murty MN (2003) Knowledge-based association rule mining using AND–OR taxonomies. Knowl-Based Syst 16(1):37–45
Subramanian DK, Ananthanarayana VS, Narasimha Murty M (2003) Knowledge-based association rule mining using AND–OR taxonomies. Department of Computer Science and Automation, pp 37–45
Tseng S, Tsui C (2004) Mining multilevel and location-aware service patterns in mobile web environments. IEEE Trans Syst Man Cybern 34:2480–2485
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp 326–335
Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: 2nd (SIAM) international conference on data mining, pp 457–473
Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Preprocessed UCI dataset with taxonomies and IBM Synthetic data generator: < http://dbdmg.polito.it/wordpress/research/cardinality-based-generalized-itemset-miner >
Other UCI datasets: < https://archive.ics.uci.edu/ml/datasets.html >
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le, B., Luong, P. Optimized cardinality-based generalized itemset mining using transaction ID and numeric encoding. Appl Intell 48, 2067–2080 (2018). https://doi.org/10.1007/s10489-017-1058-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1058-1