Computing the minimum-support for mining frequent patterns

Zhang, Shichao; Wu, Xindong; Zhang, Chengqi; Lu, Jingli

doi:10.1007/s10115-007-0081-7

Computing the minimum-support for mining frequent patterns

Regular Paper
Published: 06 April 2007

Volume 15, pages 233–257, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shichao Zhang¹,
Xindong Wu²,
Chengqi Zhang³ &
…
Jingli Lu⁴

284 Accesses
29 Citations
Explore all metrics

Abstract

Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases. It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates actual minimum-supports from the commonly-used requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Efficient Mining of Multiple Fuzzy Frequent Itemsets

Article 06 September 2016

Efficient Mining of Fuzzy Frequent Itemsets with Type-2 Membership Functions

References

Aggarawal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the ACM PODS, pp 18–24
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 207–216
Agrawal R and Shafer J (1996). Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6): 962–969
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of international conference on very large data bases, pp 487–499
Bayardo B (1998) Efficiently mining long patterns from databases. In: Proceedings of ACM international conference on management of data, pp 85–93
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 265–276
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, Heidelberg, pp 443–452
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD and Yang C (2001). Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1): 64–78
Article Google Scholar
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 43–52
El-Hajj M, Zaiane O (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington DC, pp 24–27
Han E, Karypis G and Kumar V (2000). Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3): 337–352
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Han J, Pei J, Yin Y and Mao R (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1): 53–87
Article MathSciNet Google Scholar
Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining Top-K frequent closed patterns without minimum support. In: Proceedings of the 2002 IEEE international conference on data mining, pp 211–218
Hipp J, Guntzer U (2002) Is pushing constraints deeply into the mining algorithms really what we want? SIGKDD Explor 4(1):50–55
Article Google Scholar
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE international conference on data mining, San Jose, California, pp 369–376
Lin D, Kedem Z (1998) Pincer-search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the 6th international conference on extending database technology (EDBT’98), Valencia, pp 105–119
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 4th international conference on knowledge discovery and data mining, New York, pp 80–86
Liu H and Motoda H (2001). Instance selection and construction for data mining. Kluwer, Dordrecht
Google Scholar
Omiecinski ER (2003). Alternative interest measures for mining associations in databases. IEEE TKDE 15(1): 57–69
MathSciNet Google Scholar
Park J, Chen M, Yu P (1995) An effective hash based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 175–186
Pei J, Han J, Lakshmanan L (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of 17th international conference on data engineering, Heidelberg, pp 433–442
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-Mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM’01), San Jose pp 441–448
Piatetsky-Shapiro G and Steingold S (2000). Measuring lift quality in database marketing. SIGKDD Explor 2(2): 76–80
Article Google Scholar
Roddick JF and Rice S (2001). What’s interesting about cricket?—on thresholds and anticipation in discovered rules. SIGKDD Explor 3: 1–5
Article Google Scholar
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of international conference on very large data bases, pp 688–692
Silberschatz A and Tuzhilin A (1996). What makes patterns interesting in knowledge discovery systems. IEEE Trans Knowl Data Eng 8(6): 970–974
Article Google Scholar
Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proceedings of ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 51–57
Srikant R and Agrawal R (1997). Mining generalized association rules. Future Gener Comput Syst 13: 161–180
Article Google Scholar
Steinbach M, Tan P, Xiong H, Kumar V (2004) Generalizing the notion of support. KDD04 689–694
Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th international conference on knowledge discovery and data mining, Edmonton, pp 32–41
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, Boston, pp 79–90
Wang K, He Y, Cheung D, Chin F (2001) Mining confident rules without support requirement. In: Proceedings of the 10th ACM international conference on information and knowledge management (CIKM 2001), Atlanta
Wang K, He Y and Han J (2003). Pushing support constraints into association rules mining. IEEE Trans Knowl Data Eng 15(3): 642–658
Article MathSciNet Google Scholar
Webb G (2000) Efficient search for association rules. In: Proceedings of international conference on knowledge discovery and data mining pp 99–107
Wu X, Zhang C and Zhang S (2004). Efficient mining of both positive and negative association rules. ACM Trans Inf Syst 22(3): 381–405
Article Google Scholar
Xu Y, Yu J, Liu G, Lu H (2002) From path tree to frequent patterns: a framework for mining frequent patterns. In: Proceedings of 2002 IEEE international conference on data mining (ICDM’02), Maebashi City, Japan, pp 514–521
Zaki M, Ogihara M (1998) Theoretical foundations of association rules. In: Proceedings of the 3rd ACM SIGMOD’98 workshop on research issues in data mining and knowledge discovery, Seattle, pp 85–93
Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the 3rd international conference on knowledge discovery in databases (KDD’97), Newport Beach, pp 283–286
Zhang C, Zhang S (2002) Association rules mining: models and algorithms. Publishers in Lecture Notes on Computer Science, vol 2307, Springer Berlin, p. 243
Google Scholar
Zhang C, Zhang S and Webb G (2003). Identifying approximate itemsets of interest in large databases. Appl Intell 18: 91–104
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Guangxi Normal University, Guilin, 541004, People’s Republic of China
Shichao Zhang
Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Xindong Wu
Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway, NSW, 2007, Australia
Chengqi Zhang
Institute of Information Sciences and Technology, Massey University, Palmerston North, New Zealand
Jingli Lu

Authors

Shichao Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author inPubMed Google Scholar
Chengqi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Jingli Lu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shichao Zhang.

Additional information

This work is partially supported by Australian ARC grants for discovery projects (DP0449535, DP0559536 and DP0667060), a China NSF Major Research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the Chinese Academy of Sciences (06S3011S01), and an Overseas-Returning High-level Talent Research Program of China Human-Resource Ministry.

A preliminary and shortened version of this paper has been published in the Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence (PRICAI ’04).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Wu, X., Zhang, C. et al. Computing the minimum-support for mining frequent patterns. Knowl Inf Syst 15, 233–257 (2008). https://doi.org/10.1007/s10115-007-0081-7

Download citation

Received: 23 January 2006
Revised: 12 October 2006
Accepted: 08 March 2007
Published: 06 April 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s10115-007-0081-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computing the minimum-support for mining frequent patterns

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Efficient Mining of Multiple Fuzzy Frequent Itemsets

Efficient Mining of Fuzzy Frequent Itemsets with Type-2 Membership Functions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now