Skip to main content
Log in

Optimized cardinality-based generalized itemset mining using transaction ID and numeric encoding

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 247–255

  2. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD, international conference on management of data. ACM Presss, pp 207–216

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th VLDB conference. Morgan Kaufmann Publishers Inc., San Francisco, pp 487–499

  4. Agrawal R, Srikant R (1995) Mining generalized association rules. In: VLDB 1995. Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419

  5. Apiletti D, Baralis E, Cerquitelli T, D’Elia V (2009) Characterizing network traffic by means of the netmine framework. Comput Netw 53:774–789

    Article  MATH  Google Scholar 

  6. Baralis E, Cagliero L, Cerquitelli T, D’Elia V, Garza P (2010) Support driven opportunistic aggregation for generalized itemset extraction. In: 5th IEEE international conference on intelligent systems, IS 2010, July 2010, pp 102–107

  7. Baralis E, Cagliero L, Cerquitelli T, Garza P, Marchetti M (2010) CAS-MINE: providing personalized services in context-aware applications by means of generalized rules. Knowl Inf Syst 28:283–310

    Article  Google Scholar 

  8. Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194(2012):68–84

    Article  Google Scholar 

  9. Baralis E, Cagliero L, Cerquitelli T, D’Elia V, Garza P (2014) Expressive generalized itemsets. Inf Sci 278:327–343

    Article  MathSciNet  MATH  Google Scholar 

  10. Baralis E, Cagliero L, Cerquitelli T, Chiusano S, Garza P, Grimaudo L, Pulvirenti F (2014) Misleading generalized itemset mining in the in the cloud. In: IEEE international symposium on parallel and distributed processing with applications (ISPA), pp 26–28

  11. Barsky M, Kim S, Weninger T, Han J (2011) Mining flipping correlations from large datasets with taxonomies. J Proc VLDB Endow 5(4):370–381

    Article  Google Scholar 

  12. Cagliero L (2013) Discovering temporal change patterns in the presence of taxonomies. IEEE Trans Knowl Data Eng 25(3):541–555

    Article  Google Scholar 

  13. Cagliero L, Garza P (2013) Itemsets generalization with cardinality-based constraints. Inf Sci 224:167–174

    MathSciNet  MATH  Google Scholar 

  14. Cagliero L, Cerquitelli T, Garza P, Grimaudo L (2014) Misleading generalized itemset discovery. Exp Syst Appl 41(2014):1400–1410

    Article  Google Scholar 

  15. Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Proceedings IEEE international conference on data mining, 2001. ICDM 2001. IEEE

  16. Hashem T, Ahmed CF, Samiullah Md, Akther S, Jeong B, Jeon S (2014) An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices. Exp Syst Appl 41 (6):2914–2938

    Article  Google Scholar 

  17. Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci 178 (1):69–87

    Article  MathSciNet  Google Scholar 

  18. Jayalakshmi N, Vidhya V, Krishnamurthy M, Kannan A (2012) Frequent itemset generation using double hashing technique. Proc Eng 38:1467–1478

    Article  Google Scholar 

  19. Jayanthi B, Duraiswamy K (2012) A novel algorithm for cross level frequent pattern mining in multidatasets. Int J Comput Appl 37(6):30–35

    Google Scholar 

  20. Király A, Laiho A, Abonyi J, Gyenesei A (2014) Novel techniques and an efficient algorithm for closed pattern mining. Exp Syst Appl 41(11):5105–5114

    Article  Google Scholar 

  21. Kunkle D, Zhang D, Cooperman G (2008) Mining frequent generalized itemsets and generalized association rules without redundancy. J Comput Sci Technol 23(1):77–102

    Article  MathSciNet  Google Scholar 

  22. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 67–73

  23. Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: The 7th international symposium on computers and communications, pp 1040–1045

  24. Sriphaew K, Theeramunkong T (2003) Mining generalized closed frequent itemsets of generalized associations rules. In: Knowledge-based intelligent information and engineering systems, lecture notes in computer science, vol 2773, pp 476–484

  25. Sriphaew K, Theeramunkong T (2004) Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Trans Inf Syst 87:761–770

    Google Scholar 

  26. Subramanian DK, Ananthanarayana VS, Murty MN (2003) Knowledge-based association rule mining using AND–OR taxonomies. Knowl-Based Syst 16(1):37–45

    Article  Google Scholar 

  27. Subramanian DK, Ananthanarayana VS, Narasimha Murty M (2003) Knowledge-based association rule mining using AND–OR taxonomies. Department of Computer Science and Automation, pp 37–45

  28. Tseng S, Tsui C (2004) Mining multilevel and location-aware service patterns in mobile web environments. IEEE Trans Syst Man Cybern 34:2480–2485

    Article  Google Scholar 

  29. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  Google Scholar 

  30. Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp 326–335

  31. Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: 2nd (SIAM) international conference on data mining, pp 457–473

  32. Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  33. Preprocessed UCI dataset with taxonomies and IBM Synthetic data generator: < http://dbdmg.polito.it/wordpress/research/cardinality-based-generalized-itemset-miner >

  34. Other UCI datasets: < https://archive.ics.uci.edu/ml/datasets.html >

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bac Le.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, B., Luong, P. Optimized cardinality-based generalized itemset mining using transaction ID and numeric encoding. Appl Intell 48, 2067–2080 (2018). https://doi.org/10.1007/s10489-017-1058-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1058-1

Keywords

Navigation