Abstract
With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. In this paper, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies a learning algorithm on each subset sequentially or concurrently, and then integrates the learned results. Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) are identified and seven corresponding scalable schemes are designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from weak classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The multipurpose incremental leaning system AQ51 and its testing application to three medical do mains. In: Proc. of AAAI (1986)
Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA (1984)
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–285 (1989)
Breiman, L.: Bagging predictors, TR 421, Dept. of Statistics, UC Berkeley, CA (1994)
Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp., Greenwich (1995)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, Menlo Park (1996)
Wu, X., Lo, W.: Multi-layer incremental induction. In: Proc. of PRICAI, pp. 24–32 (1998)
Provost, F., Kolluri, V.: Scaling up inductive algorithms: An overview. In: Proc. of KDD, CA, pp. 239–242 (1997)
Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery (1999)
Huber, H.: From large to huge: A statistician’s reaction to KDD and DM. In: Proc. of KDD, CA, pp. 304–308 (1997)
Chan, P.: An extensible meta-Learning approach for scalable and accurate inductive learning, Ph.D. thesis, Columbia Univ. (1996)
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient Mining of Association Rules in Distributed Databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)
Wu, X., Zhang, S.: Synthesizing High-Frequency Rules from Different Data Sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)
Quinlan, J.: Learning efficient classification procedures and their application to chess endgames. Machine Learning: An AI approach. Morgan Kaufmann, CA (1983)
Fürnkranz, J.: Integrative windowing. J. of Artificial Intelligence Research 8, 129–164 (1998)
Blake, C., Merz, C.: UCI Repository of machine learning databases (1998)
IBM Almaden Research, Synthetic data generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#classSynData
Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Datasets. In: Proc. of ICML, pp. 920–927 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Q., Wu, X., Zhu, X. (2005). Scalable Inductive Learning on Partitioned Data. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_41
Download citation
DOI: https://doi.org/10.1007/11425274_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)