Abstract
Data mining (DM) applications are composed of computing-intensive processing tasks working on huge datasets. Due to its computing-intensive nature, these applications are natural candidates for execution on high performance, high throughput platforms such as PC clusters and computational grids. Many data mining algorithms can be implemented as bag-of-tasks (BoT) applications, i.e., parallel applications composed of independent tasks. This paper discusses the use of computing grids for the execution of DM algorithms as BoT applications, investigates the scalability of the execution of an application and proposes an approach to improve its scalability.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fayyad, U.M., Shapiro, G.P., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–37. MIT Press, Cambridge (1996)
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, Dordrecht (1998)
Baraglia, R., et al.: Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In: Proc. of the 3rd Workshop on High Performance Data Mining, International Parallel and Distributed Processing Symposium, Cancun, Mexico (2000)
Baker, M., Buyya, R., Laforenza, D.: Grids and Grid Technologies for Wide-area Distributed Computing. Software, Pratice and Experience 32, 1437–1466 (2002)
Cirne, W., et al.: Running Bag-of_Tasks Applications on Ccmputational Grids: The My-Grid Approach. In: Proc. of the 2003 International Conference on Parallel Processing (October 2003)
Hruschka, E.R., Ebecken, N.F.F.: A genetic algorithm for cluster analysis. Intelligent Data Analysis (IDA) 7, 15–25 (2003)
Canataro, M., Talia, D.: The Knowledge Grid. Communications of the ACM 46(1) (2003)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Scheduling High Performance Data Mining Tasks on a Data Grid Environment. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 375–384. Springer, Heidelberg (2002)
Hinke, H., Novotny, J.: Data Mining on NASAś Information Power Grid. In: HPDC 2000, Pittsburgh, Pennsylvania, USA, pp. 292–293. IEEE Computer Society, Los Alamitos (2000)
Agrawal, R., et al.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. MIT Press, Cambridge (1996)
Goldberg, D.E.: Genetic Algorithms in Search. In: Optimization and Machine Learning, USA, Addison Wesley Longman Inc., Amsterdam (1989)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data, An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics (1990)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases Irvine, CA, University of California, http://www.ics.uci.edu
Litzkow, M., Livny, M., Mutka, M.: Condor – A Hunter of Idle Workstations. In: Proc. of the 8th International Conference of Distributed Computing Systems, June 1988, pp. 104–111 (1988)
Grimshaw, A., Wulf, W.: Legion: The next logical step toward the world-wide virtual computer. Communications of the ACM 40(1), 39–45 (1997)
BOINC. Project homepage, available at http://boinc.berkeley.edu
Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. Intl J. Supercomputer Applications 11(2), 115–128 (1997)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, Chichester (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Silva, F.A.B., Carvalho, S., Senger, H., Hruschka, E.R., de Farias, C.R.G. (2004). Running Data Mining Applications on the Grid: A Bag-of-Tasks Approach. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24709-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-24709-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22056-5
Online ISBN: 978-3-540-24709-8
eBook Packages: Springer Book Archive