Skip to main content

Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification

  • Conference paper
  • First Online:
Book cover Research and Development in Intelligent Systems XXIX (SGAI 2012)

Abstract

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier’s classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hadoop, http://hadoop.apache.org/mapreduce/ 2011.

  2. Jaume Bacardit and Natalio Krasnogor. The infobiotics PSP benchmarks repository. Technical report, 2008.

    Google Scholar 

  3. Justin D. Basilico, M. Arthur Munson, Tamara G. Kolda, Kevin R. Dixon, and W. Philip Kegelmeyer. Comet: A recipe for learning and using large ensembles on massive data. CoRR, abs/1103.2068, 2011.

    Google Scholar 

  4. C L Blake and C J Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Sciences, 1998.

    Google Scholar 

  5. M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.

    Google Scholar 

  6. M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201– 212. Kluwer, 2002.

    Google Scholar 

  7. M A Bramer. Inducer: a public domain workbench for data mining. International Journal of Systems Science, 36(14):909–919, 2005.

    Google Scholar 

  8. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.

    MathSciNet  MATH  Google Scholar 

  9. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

    Article  MATH  Google Scholar 

  10. J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.

    Article  MATH  Google Scholar 

  11. Philip Chan and Salvatore J Stolfo. Experiments on multistrategy learning by meta learning. In Proc. Second Intl. Conference on Information and Knowledge Management, pages 314–323, 1993.

    Google Scholar 

  12. Philip Chan and Salvatore J Stolfo. Meta-Learning for multi strategy and parallel learning. In Proceedings. Second International Workshop on Multistrategy Learning, pages 150–165, 1993.

    Google Scholar 

  13. B.V. Dasarathy and B.V. Sheela. A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5):708–713, 1979.

    Article  Google Scholar 

  14. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008.

    Google Scholar 

  15. Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.

    Google Scholar 

  16. J Fuernkranz. Integrative windowing. Journal of Artificial Intelligence Resarch, 8:129–164, 1998.

    MATH  Google Scholar 

  17. John L Hennessy and David A Patterson. Computer Architecture A Quantitative Approach Morgan Kaufmann, USA, third edition, 2003.

    Google Scholar 

  18. Tin Kam Ho. Random decision forests. Document Analysis and Recognition, International Conference on, 1:278, 1995.

    Google Scholar 

  19. Nan-Chen Hsieh and Lun-Ping Hung. A data driven ensemble classifier for credit scoring analysis. Expert Systems with Applications, 37(1):534 – 545, 2010.

    Article  Google Scholar 

  20. Kai Hwang and Fay A Briggs. Computer Architecture and Parallel Processing. McGraw-Hill Book Co., international edition, 1987.

    Google Scholar 

  21. Biswanath Panda, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo. Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow., 2:1426–1437, August 2009.

    Google Scholar 

  22. Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

    Google Scholar 

  23. Ross J Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993.

    Google Scholar 

  24. Lior Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33:1–39, 2010.

    Article  Google Scholar 

  25. F. Stahl, M.M. Gaber, M. Bramer, and P.S. Yu. Pocket data mining: Towards collaborative data mining in mobile computing environments. In 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), volume 2, pages 323 –330, October 2010.

    Google Scholar 

  26. Frederic Stahl and Max Bramer. Random Prism: An alternative to random forests. In Thirtyfirst SGAI International Conference on Artificial Intelligence, pages 5–18, Cambridge, England, 2011.

    Google Scholar 

  27. Frederic Stahl, Mohamed Gaber, Paul Aldridge, David May, Han Liu, Max Bramer, and Philip Yu. Homogeneous and heterogeneous distributed classification for pocket data mining. In Transactions on Large-Scale Data- and Knowledge-Centered Systems V, volume 7100 of Lecture Notes in Computer Science, pages 183–205. Springer Berlin / Heidelberg, 2012.

    Google Scholar 

  28. Ian HWitten and Frank Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, second edition, 2005.

    Google Scholar 

  29. Gongqing Wu, Haiguang Li, Xuegang Hu, Yuanjun Bi, Jing Zhang, and Xindong Wu. Mrec4.5: C4.5 ensemble classification with mapreduce. In ChinaGrid Annual Conference, 2009. ChinaGrid ’09. Fourth, pages 249 –255, 2009.

    Google Scholar 

  30. Jiang Wu, Meng-Long Li, Le-Zheng Yu, and Chao Wang. An ensemble classifier of support vector machines used to predict protein structural classes by fusing auto covariance and pseudo-amino acid composition. The Protein Journal, 29:62–67, 2010.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederic Stahl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this paper

Cite this paper

Stahl, F., May, D., Bramer, M. (2012). Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4739-8_2

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4738-1

  • Online ISBN: 978-1-4471-4739-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics