Skip to main content

Using Mining@Home for Distributed Ensemble Learning

  • Conference paper
Data Management in Cloud, Grid and P2P Systems (Globe 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7450))

Abstract

Mining@Home was recently designed as a distributed architecture for running data mining applications according to the “volunteer computing” paradigm. Mining@Home already proved its efficiency and scalability when used for the discovery of frequent itemsets from a transactional database. However, it can also be adopted in several different scenarios, especially in those where the overall application can be divided into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper describes the architecture and implementation of the Mining@Home system and evaluates its performance for the execution of ensemble learning applications. In this scenario, multiple learners are used to compute models from the same input data, so as to extract a final model with stronger statistical accuracy. Performance evaluation on a real network, reported in the paper, confirms the efficiency and scalability of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, D.P.: Public computing: Reconnecting people to science. In: Proceedings of Conference on Shared Knowledge and the Web, Madrid, Spain, pp. 17–19 (2003)

    Google Scholar 

  2. Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID 2004), Washington, DC, USA, pp. 4–10 (2004)

    Google Scholar 

  3. Bhaduri, K., Wolff, R., Giannella, C., Kargupta, H.: Distributed decision tree induction in peer-to-peer systems (2008)

    Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)

    Google Scholar 

  5. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Neri, V., Lodygensky, O.: Computing on large-scale distributed systems: Xtrem web architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems 21(3), 417–437 (2005)

    Article  Google Scholar 

  7. Cozza, P., Mastroianni, C., Talia, D., Taylor, I.: A Super-Peer Model for Multiple Job Submission on a Grid. In: Lehner, W., Meyer, N., Streit, A., Stewart, C. (eds.) Euro-Par Workshops 2006. LNCS, vol. 4375, pp. 116–125. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: A generic global computing system. In: Proceedings of the IEEE Int. Symp. on Cluster Computing and the Grid, Brisbane, Australia (2001)

    Google Scholar 

  9. Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Concurrency 1, 12–21 (1993)

    Google Scholar 

  10. Guo, Y., Sutiwaraphun, J.: Probing Knowledge in Distributed Data Mining. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 443–452. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  11. Lucchese, C., Mastroianni, C., Orlando, S., Talia, D.: Mining@home: Towards a public resource computing framework for distributed data mining. Concurrency and Computation: Practice and Experience 22(5), 658–682 (2010)

    Google Scholar 

  12. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)

    Google Scholar 

  13. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cesario, E., Mastroianni, C., Talia, D. (2012). Using Mining@Home for Distributed Ensemble Learning. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32344-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32343-0

  • Online ISBN: 978-3-642-32344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics