Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Jia, Cai-Yan; Gao, Xie-Ping

doi:10.1007/s11390-005-0309-5

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Published: May 2005

Volume 20, pages 309–318, (2005)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Cai-Yan Jia^1,2 &
Xie-Ping Gao³

69 Accesses
13 Citations
Explore all metrics

Abstract

One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy — multi-scaling sampling — is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

A Systematic Assessment of Numerical Association Rule Mining Methods

Article 22 June 2021

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Evfimievski A, Srikant R, Agrawal R, Gehrke J. Privacy preserving mining of association rules. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, EDmonton, Alberta, Canada, July 2002, pp.217–228.
Agrawal R, Mannila H, Srikant R et al. Fast Discovery of the Association Rules. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp.307–328.
Li Q, Wang H et al. Efficient mining of association rules by reducing the number of passes over the database. Journal of Computer Science and Technology, 2001, 16(2): 182–188.
Google Scholar
Zaki M J. Parallel and distributed association mining: A survey. IEEE Concurrency, 1999, 7(4): 14–25.
Article Google Scholar
Agrawal R, Shafer J C. Parallel mining of association rules. IEEE Trans. Knowledge and Data Engineering, 1996, 8(6): 962–969.
Article Google Scholar
SAS Institute Inc. Data mining and case for sampling: Solving business problems using SAS enterprise miner software. {SAS Institute White Paper}, 1998.
Chen B, Haas P, Scheuermann P. A new two-phase sampling based algorithms for discovery association rules. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, EDmonton, Alberta, Canada, July 2002, pp.462–468.
Parthasarathy S. Efficient progressive sampling for association rules. In Proc. the IEEE Int. Conf. Data Mining (ICDM’02), Maebashi City, Japan, Dec., 2002, pp.354–361.
Toivonen H. Sampling large databases for association rules. In Proc. the 22nd Int. Conf. Very Large Data Bases, Mumbai, Bombay, India, Sept. 1996, pp.134–145.
Zaki M J, Parthasarathy S, Li W et al. Evaluation of sampling for data mining of association rules. In Proc. the 7th Workshop on Research Issues in Data Engineer, Birmingham, UK, April 1997, pp.42–50.
Watanabe O. Simple sampling techniques for discovery science. IEICE Trans. Information and Systems, 2000, E83-D(1): 19–26.
Google Scholar
Zhang C, Zhang S, Webb G I. Identifying approximate itemsets of interest in large databases. Applied Intelligence, 2003, 18: 91–104.
Article Google Scholar
Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27: 1134–1142.
Article Google Scholar
John G H, Langley P. Static versus dynamic sampling for data mining. In Proc. the 2nd Int. Conf. Knowledge Discovery and Data Mining, KDD-96, Portland, OR, Aug. 1996, pp.367–370.
Suzuki E. Sampling theories for rule discovery based on generality and accuracy, the worst case and a distribution-based case. Communication of Institute of Information and Computing Machinery, May, 2002, 5(2): 83–88.
Google Scholar
Zaki M J, Hsiao C J. CHARM: An efficient algorithm for closed association rule mining. {Technical Report 99-10}, Computer Science Dept., Rensselaer Polytechnic Institute, Oct., 1999.
Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. the 17th Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001, pp.443–452.
Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Data Bases, Santiago, Chile, 1994, pp.487–499.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proc. the ACM SIGMOD Int. Conf. the Management of Data, Dallas, TX, May 2000, pp.1–12.
Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. the ACM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, May, 2000, pp.21–30.
Provost F, Jensen D, Oates T. Efficient progressive sampling. In Proc. the 5th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Diego, CA, USA, Aug. 1999, pp.23–32.
Vitter J S. An efficient algorithm for sequential random sampling. ACM Trans. Mathematical Software, 1987, 13(1): 58–67.
Article Google Scholar
http://fuzzy.cs.uni-magdeburg.de/~borgelt/

Download references

Author information

Authors and Affiliations

The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, P.R. China
Cai-Yan Jia
Graduate School of the Chinese Academy of Sciences, Beijing, 100039, P.R. China
Cai-Yan Jia
Information Engineering College, Xiangtan University, Xiangtan, 411105, P.R. China
Xie-Ping Gao

Authors

Cai-Yan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xie-Ping Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cai-Yan Jia.

Additional information

Regular Paper The work is partially supported by CAS Project of Brain and Mind Science, Pre-973 Project 2001CCA03000, the National High Technology 863 Program of China under Grant No.2001AA113130, the National Basic Research 973 Program of China under Grant No.2001CB312004, Innovation Foundation of IOM, AMSS and ICT Projects, the National Natural Science Foundation of China under Grant Nos.69733020, 60375021, Natural Science Foundation of Hunan Province under Grant No.03JJY3096.

Cai-Yan Jia is engaging in the postdoctoral study at Department of Computer Science and Engineering in Fudan University. She received the Ph.D. degree from Institute of Computing Technology, Chinese Academy of Sciences, in July 2004 and the M.S. degree from Department of Mathematics of Xiangtan University, P.R. China in July 2001. Her recent research interest includes data mining, machine learning, computational intelligence and bioinformatics. She has published several papers in conferences and journals.

Xie-Ping Gao received the B.S. and M.S. degrees from Xiangtan University, P.R. China, in 1985 and 1988, respectively, and the Ph.D. degree from Hunan University, P.R. China in 2003. Since July 1999, he has been a professor with Mathematical Department and Information Engineering College, Xiangtan University. From December 2002 to December 2003, he joined the School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, where he was a visiting professor. His current research interests are in the areas of wavelets analysis, neural networks, evolution computation, data mining, and image compression. He has co-authored more than 60 journal papers, conference papers, book chapters.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, CY., Gao, XP. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules. J Comput Sci Technol 20, 309–318 (2005). https://doi.org/10.1007/s11390-005-0309-5

Download citation

Received: 08 July 2003
Revised: 20 February 2004
Issue Date: May 2005
DOI: https://doi.org/10.1007/s11390-005-0309-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Abstract

Access this article

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

A Systematic Assessment of Numerical Association Rule Mining Methods

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Abstract

Access this article

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

A Systematic Assessment of Numerical Association Rule Mining Methods

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation