Skip to main content

Scalable Inductive Learning on Partitioned Data

  • Conference paper
Book cover Foundations of Intelligent Systems (ISMIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

  • 1086 Accesses

Abstract

With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. In this paper, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies a learning algorithm on each subset sequentially or concurrently, and then integrates the learned results. Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) are identified and seven corresponding scalable schemes are designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from weak classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The multipurpose incremental leaning system AQ51 and its testing application to three medical do mains. In: Proc. of AAAI (1986)

    Google Scholar 

  2. Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA (1984)

    Google Scholar 

  4. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–285 (1989)

    Google Scholar 

  5. Breiman, L.: Bagging predictors, TR 421, Dept. of Statistics, UC Berkeley, CA (1994)

    Google Scholar 

  6. Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp., Greenwich (1995)

    Google Scholar 

  7. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, Menlo Park (1996)

    Google Scholar 

  8. Wu, X., Lo, W.: Multi-layer incremental induction. In: Proc. of PRICAI, pp. 24–32 (1998)

    Google Scholar 

  9. Provost, F., Kolluri, V.: Scaling up inductive algorithms: An overview. In: Proc. of KDD, CA, pp. 239–242 (1997)

    Google Scholar 

  10. Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery (1999)

    Google Scholar 

  11. Huber, H.: From large to huge: A statistician’s reaction to KDD and DM. In: Proc. of KDD, CA, pp. 304–308 (1997)

    Google Scholar 

  12. Chan, P.: An extensible meta-Learning approach for scalable and accurate inductive learning, Ph.D. thesis, Columbia Univ. (1996)

    Google Scholar 

  13. Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient Mining of Association Rules in Distributed Databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)

    Article  Google Scholar 

  14. Wu, X., Zhang, S.: Synthesizing High-Frequency Rules from Different Data Sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)

    Article  Google Scholar 

  15. Quinlan, J.: Learning efficient classification procedures and their application to chess endgames. Machine Learning: An AI approach. Morgan Kaufmann, CA (1983)

    Google Scholar 

  16. Fürnkranz, J.: Integrative windowing. J. of Artificial Intelligence Research 8, 129–164 (1998)

    MATH  Google Scholar 

  17. Blake, C., Merz, C.: UCI Repository of machine learning databases (1998)

    Google Scholar 

  18. IBM Almaden Research, Synthetic data generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#classSynData

  19. Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Datasets. In: Proc. of ICML, pp. 920–927 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Q., Wu, X., Zhu, X. (2005). Scalable Inductive Learning on Partitioned Data. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_41

Download citation

  • DOI: https://doi.org/10.1007/11425274_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25878-0

  • Online ISBN: 978-3-540-31949-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics