Skip to main content
Log in

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. Proc. ACM SIGMOD International Conference on Management of Data, Washington, DC, p. 207.

  • Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Chile, pp. 487–499.

  • Cheung, D.W., Han, J., Ng, V.T., Fu, A., and Fu, Y. 1996a. A fast distributed algorithm for mining association rules. Proc. Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, FL.

  • Cheung, D.W., Han, J., Ng, V.T., and Wong, C.Y. 1996b. Maintenance of discovered association rules in large databases: An incremental updating technique. Proceedings of the Twelfth International Conference on Data Engineering, New Orleans, LA, IEEE Computer Society.

  • Cheung, D.W.L., Lee, S.D., and Kao, B. 1997. A general incremental technique for maintaining discovered association rules. Proceedings of the Fifth International Conference on Database Systems for Advanced Applications, Melbourne, Australia, pp. 185–194.

  • Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. Proceedings of the 21st VLDB Conference, Zurich, Switzerland, pp. 420–431.

  • Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. First International Conference on Knowledge Discovery and Data Mining (KDD'95), Montreal, Canada, AAAI Press, pp. 150–155.

  • Kivinen, J. and Mannila, H. 1994. The power of sampling in knowledge discovery. 13th Symposium—1994 May: Minneapolis; MN, vol. 13 of Proceedings of the ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems 1994, New York, NY 10036, USA, ACM Press, pp. 77–85.

  • Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. 1994. Finding interesting rules from large sets of discovered association rules. In Third International Conference on Information and Knowledge Management (CIKM'94), N.R. Adam, B.K. Bhargava, and Y. Yesha (Eds.). ACM Press, pp. 401–407.

  • Mannila, H. and Toivonen, H. 1996. On an algorithm for finding all interesting sentences. Cybernetics and systems Research’ 96, Vienna, Austria, Austrian Society for Cybernetic Studies, pp. 973–978.

  • Mannila, H., Toivonen, H., and Verkamo, A.I. 1994. Efficient algorithms for discovering association rules. In Knowledge Discovery in Databases (KDD'94), U.M. Fayyad and R. Uthurusamy (Eds.). Seattle, Washington, AAAI Press, pp. 181–192.

    Google Scholar 

  • Park, J.S., Chen, M.-S., and Yu, P.S. 1995a. Efficient parallel data mining for association rules. Proc. 1995 International Conference on Information and Knowledge Management, Baltimore, MD.

  • Park, J.S., Chen, M.-S., and Yu, P.S. 1995b. An effective hash-based algorithm for mining association rules. Proc. ACM SIGMOD International Conference on Management of Data, San Jose, CA.

  • Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proc. ACM SIGMOD International Conference on Management of Data, H.V. Jagadish and I.S. Mumick (Eds.). Montreal, Canada.

  • Toivonen, H. 1996. Sampling large databases for finding association rules. Proceedings of the 22th International Conference on Very Large Databases (VLDB'96), Mumbay, India, Morgan Kaufmann, pp. 134–145.

  • Trivedi, K.S. 1988. Probability and Statistics with Reliability, Queuing and Computer Science Applications. New Delhi, India: Prentice Hall.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Cheung, D.W. & Kao, B. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules. Data Mining and Knowledge Discovery 2, 233–262 (1998). https://doi.org/10.1023/A:1009703019684

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009703019684

Navigation