Using Mining@Home for Distributed Ensemble Learning

Cesario, Eugenio; Mastroianni, Carlo; Talia, Domenico

doi:10.1007/978-3-642-32344-7_9

Eugenio Cesario¹⁹,
Carlo Mastroianni¹⁹ &
Domenico Talia^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7450))

Included in the following conference series:

International Conference on Data Management in Cloud, Grid and P2P Systems

553 Accesses
2 Citations

Abstract

Mining@Home was recently designed as a distributed architecture for running data mining applications according to the “volunteer computing” paradigm. Mining@Home already proved its efficiency and scalability when used for the discovery of frequent itemsets from a transactional database. However, it can also be adopted in several different scenarios, especially in those where the overall application can be divided into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper describes the architecture and implementation of the Mining@Home system and evaluates its performance for the execution of ensemble learning applications. In this scenario, multiple learners are used to compute models from the same input data, so as to extract a final model with stronger statistical accuracy. Performance evaluation on a real network, reported in the paper, confirms the efficiency and scalability of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, D.P.: Public computing: Reconnecting people to science. In: Proceedings of Conference on Shared Knowledge and the Web, Madrid, Spain, pp. 17–19 (2003)
Google Scholar
Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID 2004), Washington, DC, USA, pp. 4–10 (2004)
Google Scholar
Bhaduri, K., Wolff, R., Giannella, C., Kargupta, H.: Distributed decision tree induction in peer-to-peer systems (2008)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Neri, V., Lodygensky, O.: Computing on large-scale distributed systems: Xtrem web architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems 21(3), 417–437 (2005)
Article Google Scholar
Cozza, P., Mastroianni, C., Talia, D., Taylor, I.: A Super-Peer Model for Multiple Job Submission on a Grid. In: Lehner, W., Meyer, N., Streit, A., Stewart, C. (eds.) Euro-Par Workshops 2006. LNCS, vol. 4375, pp. 116–125. Springer, Heidelberg (2007)
Chapter Google Scholar
Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: A generic global computing system. In: Proceedings of the IEEE Int. Symp. on Cluster Computing and the Grid, Brisbane, Australia (2001)
Google Scholar
Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Concurrency 1, 12–21 (1993)
Google Scholar
Guo, Y., Sutiwaraphun, J.: Probing Knowledge in Distributed Data Mining. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 443–452. Springer, Heidelberg (1999)
Chapter Google Scholar
Lucchese, C., Mastroianni, C., Orlando, S., Talia, D.: Mining@home: Towards a public resource computing framework for distributed data mining. Concurrency and Computation: Practice and Experience 22(5), 658–682 (2010)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)
Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR, Italy
Eugenio Cesario, Carlo Mastroianni & Domenico Talia
University of Calabria, Italy
Domenico Talia

Authors

Eugenio Cesario
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Mastroianni
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Talia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118 route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain & Franck Morvan &
Centre for Quantum Computation and Intelligent Systems, Decision Support and e-Service Intelligence Lab, School of Software, University of Technology, Faculty of Engineering and Information Technology, 2007, Sydney, Ultimo, NSW, Australia
Farookh Khadeer Hussain
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cesario, E., Mastroianni, C., Talia, D. (2012). Using Mining@Home for Distributed Ensemble Learning. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-32344-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32343-0
Online ISBN: 978-3-642-32344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics