Skip to main content

Parallel Rule Induction with Information Theoretic Pre-Pruning

  • Conference paper
  • First Online:
Book cover Research and Development in Intelligent Systems XXVI

Abstract

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hunt E. B., Marin J., and Stone P. J., Experiments in Induction. 1966: Academic Press.

    Google Scholar 

  2. Quinlan J. R., Induction of decision trees. Machine Learning.Vol. 1. 1986. 81-106.

    Google Scholar 

  3. Michalski R.S., Onthe quasi-minimal solution of the general covering problem, in Proceedings of the Fifth International Symposium on Information Processing.1969: Bled, Yugoslavia. p. 125-128.

    Google Scholar 

  4. Cendrowska J., PRISM: an Algorithm for Inducing Modular Rules.International Journal of Man-Machine Studies, 1987. 27: p. 349-370.

    Article  MATH  Google Scholar 

  5. Catlett J., Megainduction: Machine learning on very large databases.1991, University of Technology, Sydney.

    Google Scholar 

  6. Metha M., Agrawal R., and Rissanen J., SLIQ: A Fast Scalable Classifier for Data Mining. International Conference on Extending Database Technology EDBT'96), 1996.

    Google Scholar 

  7. Shafer J. C., Agrawal R., and Mehta M., SPRINT: A Scalable Parallel Classifier for Data Mining. Twenty-second International Conference on Very Large Data Bases, 1996.

    Google Scholar 

  8. Srivastava, A., et al., Parallel Formulations of Decision-Tree Classification Algorithms. Data Mining and Knowledge Discovery, 1999. 3(3): p. 237-263.

    Article  Google Scholar 

  9. Stahl F., Bramer M., and A. M., PMCRI: A Parallel Modular Classification Rule Induction Framework., in Sixth International Conference on Machine Learning and Data Mining.In Press, Springer: Leipzig.

    Google Scholar 

  10. Bramer M., An Information-Theoretic Approach to the Pre-pruning of Classification Rules. Proceedings of the IFIP Seventeenth World Computer Congress - TC12 Stream on Intelligent Information Processing. 2002: Kluwer, B.V. 201-212.

    Google Scholar 

  11. Bramer M., Inducer: a public domain workbench for data mining. International Journal of Systems Science, 2005. 36(14): p. 909-919.

    Article  MATH  Google Scholar 

  12. Smyth, P. and R.M. Goodman, An Information Theoretic Approach to Rule Induction from Databases. IEEE Trans. on Knowledge and Data Eng, 1991. 4(4): p. 301-316.

    Article  Google Scholar 

  13. Blake C. L. and Merz C. J, UCI repository of machine learning databases. 1998, University of California, Irvine, Department of Information and Computer Sciences.

    Google Scholar 

  14. Stout M., et al., Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 2008. 24(7): p. 916-923.

    Article  Google Scholar 

  15. Provost F., Distributed Data Mining: Scaling up and Beyond, in Advances in Distributed and Parallel Knowledge Discovery, P.C. H. Kargupta, Editor. 2000, AAAI Press / The MIT Press.

    Google Scholar 

  16. Nolle L., Wong K. C. P., and Hopgood A., DARBS: A Distributed Blackboard System. Twenty-first SGES International Conference on Knowledge Based Systems, 2001.

    Google Scholar 

  17. Stahl F. and Bramer M., P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction, in IFIP International Conference on Artificial Intelligence. 2008, Springer: Milan.

    Google Scholar 

  18. Stahl F. and Bramer M., Parallel Induction of Modular Classification Rules, in Twentyeighth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. 2008, Springer: Cambridge.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mo Adda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London

About this paper

Cite this paper

Stahl, F., Bramer, M., Adda, M. (2010). Parallel Rule Induction with Information Theoretic Pre-Pruning. In: Bramer, M., Ellis, R., Petridis, M. (eds) Research and Development in Intelligent Systems XXVI. Springer, London. https://doi.org/10.1007/978-1-84882-983-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-983-1_11

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-982-4

  • Online ISBN: 978-1-84882-983-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics