Skip to main content

Active Mining in a Distributed Setting

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

Abstract

Most current work in data mining assumes that the data is static, and a database update requires re-mining both the old and new data. In this article, we propose an alternative approach. We outline a general strategy by which data mining algorithms can be made active — i.e., maintain valid mined information in the presence of user interaction and database updates. We describe a runtime framework that allows efficient caching and sharing of data among clients and servers. We then demonstrate how existing algorithms for four key mining tasks: Discretization, Association Mining, Sequence Mining, and Similarity Discovery, can be re-architected so that they maintain valid mined information across i) database updates, and ii) user interactions in a client-server setting, while minimizing the amount of data re-accessed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Aggarwal and P. Yu. Online generation of association rules. In IEEE International Conference on Data Engineering, February 1998.

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Conf. Management of Data, May 1993.

    Google Scholar 

  3. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  4. D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu. A fast distributed algorithm for mining association rules. In 4th Intl. Conf. Parallel and Distributed Info. Systems, December 1996.

    Google Scholar 

  5. D. Cheung, J. Han, V. Ng, and C. Wong. Maintenance of discovered association rules in large databases: an incremental updating technique. In 12th IEEE Intl. Conf. on Data Engineering, February 1996.

    Google Scholar 

  6. G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In Proceedings of the 4th Symposium on Knowledge Discovery and Data-Mining, 1998.

    Google Scholar 

  7. L. Devroye. A course in density estimation. In Birkhauser: Boston MA, 1987.

    MATH  Google Scholar 

  8. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. 12th ICML, 1995.

    Google Scholar 

  9. U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. 14th IJCAI, 1993.

    Google Scholar 

  10. R. Feldman, Y. Aumann, A. Amir, and H. Mannila. Efficient algorithms for discovering frequent sets in incremental databases. In 2rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, May 1997.

    Google Scholar 

  11. R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu, and B. Malhi. Design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.

    Google Scholar 

  12. Y. Guo, S. Rueger, J. Sutiwaraphun, and J. Forbes-Millot. Metalearning for parallel data mining. In Proceedings of the Seventh Parallel Computing Workshop, 1997.

    Google Scholar 

  13. H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In KDD, Aug 1997.

    Google Scholar 

  14. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In 3rd Intl. Conf. Information and Knowledge Management, pages 401–407, November 1994.

    Google Scholar 

  15. R. T. Ng, L. Lakshmanan, J. Jan, and A. Pang. Exploratory mining and pruning optimizations of constrained association rules. In ACM SIGMOD Intl. Conf. Management of Data, June 1998.

    Google Scholar 

  16. S. Parthasarathy and S. Dwarkadas. Shared state for client server applications. TR716, Department of Computers Science, University of Rochester, June 1999.

    Google Scholar 

  17. S. Parthasarathy, R. Subramonian, and R. Venkata. Generalized discretization for summarization and classification. In PADD98, January 1998.

    Google Scholar 

  18. S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. TR715, Department of Computers Science, University of Rochester, June 1999.

    Google Scholar 

  19. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos CA, 1993.

    Google Scholar 

  20. J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.

    Google Scholar 

  21. R. Srikant, Q. Vu, and R. Agrawal. Mining Association Rules with Item Constraints. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.

    Google Scholar 

  22. S. Stolfo, A. Prodromidis, and P. Chan. Jam:java agents for meta-learning over distributed databases. In KDD, Aug 1997.

    Google Scholar 

  23. R. Subramonian and S. Parthasarathy. A framework for distributed data mining. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.

    Google Scholar 

  24. R. Subramonian, R. Venkata, and J. Chen. A visual interactive framework for attribute discretization. In Third International Conference on Knowledge Discovery and Data Mining, pages 82–88, 1997.

    Google Scholar 

  25. S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. Incremental updation of association rules. In KDD97, Aug 1997.

    Google Scholar 

  26. K. Wang. Discovering patterns from large and dynamic sequential data. J. Intelligent Information Systems, 9(1), August 1997.

    Google Scholar 

  27. M. J. Zaki. Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management, November 1998.

    Google Scholar 

  28. M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal, 1(4):343–373, December 1997.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Parthasarathy, S., Dwarkadas, S., Ogihara, M. (2000). Active Mining in a Distributed Setting. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-46502-2_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67194-7

  • Online ISBN: 978-3-540-46502-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics