Scalable Inductive Learning on Partitioned Data

Chen, Qijun; Wu, Xindong; Zhu, Xingquan

doi:10.1007/11425274_41

Qijun Chen²²,
Xindong Wu²² &
Xingquan Zhu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1086 Accesses

Abstract

With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. In this paper, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies a learning algorithm on each subset sequentially or concurrently, and then integrates the learned results. Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) are identified and seven corresponding scalable schemes are designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from weak classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The multipurpose incremental leaning system AQ51 and its testing application to three medical do mains. In: Proc. of AAAI (1986)
Google Scholar
Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA (1984)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–285 (1989)
Google Scholar
Breiman, L.: Bagging predictors, TR 421, Dept. of Statistics, UC Berkeley, CA (1994)
Google Scholar
Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp., Greenwich (1995)
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, Menlo Park (1996)
Google Scholar
Wu, X., Lo, W.: Multi-layer incremental induction. In: Proc. of PRICAI, pp. 24–32 (1998)
Google Scholar
Provost, F., Kolluri, V.: Scaling up inductive algorithms: An overview. In: Proc. of KDD, CA, pp. 239–242 (1997)
Google Scholar
Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery (1999)
Google Scholar
Huber, H.: From large to huge: A statistician’s reaction to KDD and DM. In: Proc. of KDD, CA, pp. 304–308 (1997)
Google Scholar
Chan, P.: An extensible meta-Learning approach for scalable and accurate inductive learning, Ph.D. thesis, Columbia Univ. (1996)
Google Scholar
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient Mining of Association Rules in Distributed Databases. IEEE Transactions on Knowledge and Data Engineering 8(6), 911–922 (1996)
Article Google Scholar
Wu, X., Zhang, S.: Synthesizing High-Frequency Rules from Different Data Sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)
Article Google Scholar
Quinlan, J.: Learning efficient classification procedures and their application to chess endgames. Machine Learning: An AI approach. Morgan Kaufmann, CA (1983)
Google Scholar
Fürnkranz, J.: Integrative windowing. J. of Artificial Intelligence Research 8, 129–164 (1998)
MATH Google Scholar
Blake, C., Merz, C.: UCI Repository of machine learning databases (1998)
Google Scholar
IBM Almaden Research, Synthetic data generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#classSynData
Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Datasets. In: Proc. of ICML, pp. 920–927 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Vermont, 33 Colchester Avenue, Burlington, Vermont, 05405, USA
Qijun Chen, Xindong Wu & Xingquan Zhu

Authors

Qijun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRIS - UFR d’Informatique, Université Claude Bernard Lyon 1, 43, boulevard du 11 novembre 1918, 69622, Villeurbanne, France
Mohand-Said Hacid
Department of Computer Science, State University of New York, 12222, Albany, NY, USA
Neil V. Murray
Department of Computer Science, University of North Carolina, 28223, Charlotte, NC, USA
Zbigniew W. Raś
Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Wu, X., Zhu, X. (2005). Scalable Inductive Learning on Partitioned Data. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_41

Download citation

DOI: https://doi.org/10.1007/11425274_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics