Adapting the Weka Data Mining Toolkit to a Grid Based Environment

Pérez, María S.; Sánchez, Alberto; Herrero, Pilar; Robles, Víctor; Peña, José M.

doi:10.1007/11495772_77

María S. Pérez²¹,
Alberto Sánchez²¹,
Pilar Herrero²¹,
Víctor Robles²¹ &
…
José M. Peña²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3528))

Included in the following conference series:

International Atlantic Web Intelligence Conference

1028 Accesses
13 Citations

Abstract

Data Mining is playing a key role in most enterprises, which have to analyse great amounts of data in order to achieve higher profits. Nevertheless, due to the large datasets involved in this process, the data mining field must face some technological challenges. Grid Computing takes advantage of the low-load periods of all the computers connected to a network, making possible resource and data sharing. Providing Grid services constitute a flexible manner of tackling the data mining needs. This paper shows the adaptation of Weka, a widely used Data Mining tool, to a grid infrastructure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (1993)
Google Scholar
Allcock, W., Bester, J., Bresnahan, A., Chervenak, A., Liming, L., Tuecke, S.: GridFTP: Protocol extensions to FTP for the Grid. In: Global Grid Forum Draft (2001)
Google Scholar
Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003), doi:10.1145/602421.602425
Article Google Scholar
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Foster, I.: The anatomy of the Grid: Enabling scalable virtual organizations. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, p. 1. Springer, Heidelberg (2001)
Chapter Google Scholar
Giannadakis, N., Rowe, A., Ghanem, M., Guo, Y.: InfoGrid: providing information integration for knowledge discovery. Information Sciences. Special Issue: Knowledge Discovery from Distributed Information Sources 155(3–4),199–226 (2003)
Google Scholar
Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A toolkit for machine learning on the grid. In: ERCIM News, vol. 59 (October 2004)
Google Scholar
Maniatty, W.A., Zaki, M.J.: A requirements analysis for parallel kdd systems. In: Rolim, J.D.P. (ed.) IPDPS-WS 2000. LNCS, vol. 1800, pp. 358–265. Springer, Heidelberg (2000)
Google Scholar
Pérez, M.S., Pons, R.A., García, F., Carretero, J., Córdoba, M.L.: An optimization of Apriori algorithm through the usage of parallel I/O and hints. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, p. 449. Springer, Heidelberg (2002)
Chapter Google Scholar
Sánchez, A., Sánchez, J.M.P., Pérez, M.S., Robles, V., Herrero, P.: Improving distributed data mining techniques by means of a grid infrastructure. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM-WS 2004. LNCS, vol. 3292, pp. 111–122. Springer, Heidelberg (2004)
Chapter Google Scholar
Witten, H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
María S. Pérez, Alberto Sánchez, Pilar Herrero, Víctor Robles & José M. Peña

Authors

María S. Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Pilar Herrero
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Robles
View author publications
You can also search for this author in PubMed Google Scholar
José M. Peña
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01–447, Warsaw, Poland
Janusz Kacprzyk
Institute of Computer Science, Technical University of Łódź, Poland
Adam Niewiadomski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez, M.S., Sánchez, A., Herrero, P., Robles, V., Peña, J.M. (2005). Adapting the Weka Data Mining Toolkit to a Grid Based Environment. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds) Advances in Web Intelligence. AWIC 2005. Lecture Notes in Computer Science(), vol 3528. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11495772_77

Download citation

DOI: https://doi.org/10.1007/11495772_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26219-0
Online ISBN: 978-3-540-31900-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics