A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Troiano, Luigi; Scibelli, Giacomo

doi:10.1007/s10618-013-0304-3

A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Published: 12 May 2013

Volume 28, pages 773–807, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Luigi Troiano¹ &
Giacomo Scibelli¹

950 Accesses
21 Citations
Explore all metrics

Abstract

In this paper we face the problem of searching for rare itemsets. A main issue regards the strategy to adopt in exploring the power set lattice. Assuming a power set lattice with full set at the top and empty set at the bottom, the most of the algorithms adopt a bottom-up exploration, i.e. moving from smaller to larger sets. Although this approach is advantageous in the case of frequent itemsets, it might not be worth being used for rare itemsets, as they occur on the top of the lattice. We propose Rarity, a top-down breadth-first level-wise algorithm. Experimental results and comparisons are illustrated in order to provide a quantitative characterization of algorithm performances and complexity. Application to some UCI benchmark and real world datasets is provided. An algorithm parallelization is outlined. Experiments showed that this approach takes advantage of finding all rare non-zero itemsets in less time than other solutions, at expenses of higher memory demand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

Incremental mining of high utility sequential patterns using MapReduce paradigm

Article 08 November 2021

Notes

Terms “support” and “support count” assume different meaning in data mining. In the context of this work, they refer to how frequent an itemset is. When this is computed in terms of number of occurrences, it is more appropriate the use of “support count”. More in general, we will refer to “support” as expression of itemset occurence.
The name of the property comes from the fact that the set of frequent itemsets is closed with respect to set inclusion.
An itemsets X is said to be closed iff it is the largest subset of items in common to transactions in which \(X\) appears (Zaki et al. 1997).
A generator is a frequent itemset with none of its proper subsets having its same support.
They are actually sets, as they do not admit duplicates, but for the sake of simplicity, we refer to them as lists.
http://archive.ics.uci.edu/ml/datasets/Mushroom.
http://www.almaden.ibm.com/software/quest/Resources.
http://www.census.gov/.
http://web.ist.utl.pt/~acardoso/datasets/

References

Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of 6th International Conference on Machine Learning and Applications, ICMLA ’07. IEEE Computer Society, Washington, DC, pp 73–80
Agrawal R, Imieliński T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Int Conf Manag Data 22:207–216
Article Google Scholar
Agrawal R, Mannila H, Srikant R, Toivonen H, Inkeri Verkamo A (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: 20th VLDB Conference
Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor Newsl 2(2):66–75
Article Google Scholar
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 255–264
Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conferences on Data Engineering. IEEE Computer Society, Washington, DC, pp 443–452
Forina M (1991) Wine dataset. http://archive.ics.uci.edu/ml/datasets/wine. Accessed 5 Nov 2012
Haglin DJ, Manning AM (2007) On minimal infrequent itemset mining. In: DMIN. CSREA Press, Las Vegas, pp 141–147
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In: Mannila H (ed) Data mining and knowledge discovery. Kluwer, New York, pp 53–87
Google Scholar
Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. In: PAKDD. Springer, New York, pp 97–106
Koh YS, Rountree N, O’Keefe RA (2008) Mining interesting imperfectly sporadic rules. Knowl Inf Syst 14(2):179–196
Article Google Scholar
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: KDD ’99: Proceedings of 5th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. ACM, New York, pp 337–341
Mannila H, Toivonen H, Verkamo I (1994) Efficient algorithms for discovering association rules. In: KDD ’94: Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. AAAI Press, Seattle, pp 181–192
Nakai K (1996a) Ecoli dataset. http://archive.ics.uci.edu/ml/datasets/ecoli. Accessed 5 Nov 2012
Nakai K (1996b) Yeast dataset. http://archive.ics.uci.edu/ml/datasets/yeast. Accessed 5 Nov 2012
Park JS, Chen M-S, Yu PS (1995) Efficient parallel data mining for association rules. In: CIKM ’95: Proceedings of 4th International Conference on Information and Knowledge Management. ACM, New York, pp 31–36
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Closed set based discovery of small covers for association rules. In: Proceedings 15emes Journees Bases de Donnees Avancees. BDA, pp 361–381
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: ICDM ’01: Proceedings of the 2001 IEEE International Conferences on Data Mining. Washington, DC, pp 441–448
Piatetsky-Shapiro G, Frawley WJ (eds) (1991) Knowledge discovery in databases. AAAI/MIT Press, Cambridge
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In VLDB ’95: Proceedings of 21st International Conferences on Very Large Data Bases. Morgan Kaufmann, San Francisco, pp 432–444
Shenoy P, Haritsa JR, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. SIGMOD Rec 29(2):22–33
Article Google Scholar
Song M, Rajasekaran S (2006) A transaction mapping algorithm for frequent itemsets mining. IEEE Trans Knowl Data Eng 18(4):472–481
Article Google Scholar
Szathmary L, Napoli A, Kuznetsov SO (2007) ZART: a multifunctional itemset mining algorithm. In: Proceedings of the 5th International Conferences on Concept Lattices and Their Applications (CLA ’07). Montpellier, pp 26–37
Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: ICTAI ’07: Proceedings of 19th IEEE International Conferences on Tools with Artificial Intelligence. Washington, DC, pp 305–312
Troiano L, Scibelli G, Birtolo C (2009) A fast algorithm for mining rare itemsets. In: ISDA’09, pp 1149–1155
Tsang S, Koh YS, Dobbie G (2011) Rp-tree: rare pattern tree mining. In: Proceedings of CLA, pp 277–288
Uno T, Asai T, Uchida Y, Arimura H (2003) Lcm: an efficient algorithm for enumerating frequent closed item sets. In: FIMI03: Proceedings of Workshop on Frequent Itemset Mining Implementations
Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations
Uno T, Kiyomi M, Arimura H (2005) Lcm ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, ACM, New York, pp 77–86
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19
Article Google Scholar
Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: KDD ’04: Proceedings of 10th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. New York, pp 344–353
Yun H, Ha D, Hwang B, Ryu KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191
Article Google Scholar
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: KDD ’03: Proceedings of 9th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. New York, pp 326–335
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Technical report, Rochester

Download references

Author information

Authors and Affiliations

Department of Engineering, University of Sannio, 81031 , Benevento, Italy
Luigi Troiano & Giacomo Scibelli

Authors

Luigi Troiano
View author publications
You can also search for this author in PubMed Google Scholar
Giacomo Scibelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Troiano.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Troiano, L., Scibelli, G. A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. Data Min Knowl Disc 28, 773–807 (2014). https://doi.org/10.1007/s10618-013-0304-3

Download citation

Received: 09 July 2011
Accepted: 06 February 2013
Published: 12 May 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10618-013-0304-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Incremental mining of high utility sequential patterns using MapReduce paradigm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Incremental mining of high utility sequential patterns using MapReduce paradigm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation