Abstract
Data mining applications are composed of computing-intensive processing tasks, which are natural candidates for execution on high performance, high throughput platforms such as PC clusters and computational grids. Besides, some data-mining algorithms can be implemented as Bag-of-Tasks (BoT) applications, which are composed of parallel, independent tasks. Due to its own nature, the adaptation of BoT applications for the grid is straightforward. In this sense, this work proposes a scheduling algorithm for running BoT data mining applications on grid platforms. The proposed algorithm is evaluated by means of several experiments, and the obtained results show that it improves both scalability and performance of such applications.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Fayyad, U.M., Shapiro, G.P., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–37. MIT Press, Cambridge (1996)
Witten, I.H., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, USA (2000)
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, Dordrecht (1998)
Baraglia, R., Laforenza, D., Orlando, S., Palmerini, P., Perego, R.: Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In: Proceedings of the 3rd Workshop on High Performance Data Mining, International Parallel and Distributed Processing Symposium 2000, Cancun, Mexico (May 2000)
Baker, M., Buyya, R., Laforenza, D.: Grids and Grid Technologies for Wide-area Distributed Computing. Software, Pratice and Experience 32, 1437–1466 (2002)
Cirne, W., Paranhos, D., Costa, L., Santos-Neto, E., Brasileiro, F., Sauvé, J., Oshtoff, C., Silva, F., Silveira, C.: Running Bag-of_Tasks Applications on Ccmputational Grids: The MyGrid Approach. In: Proceedings of the 2003 International Conference on Parallel Processing (October 2003)
Canataro, M., Talia, D.: The Knowledge Grid. Communications of ACM 46(1) (2003)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Scheduling High Performance Data Mining Tasks on a Data Grid Environment. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 375–384. Springer, Heidelberg (2002)
Hinke, H., Novotny, J.: Data Mining on NASAś Information Power Grid. In: HPDC 2000, Pittsburgh, Pennsylvania, USA, pp. 292–293. IEEE Computer Society, Los Alamitos (2000)
Agrawal, R., Mannila, H., Srikant, R., Tiovonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. MIT Press, Cambridge (1996)
Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. Intl J. Supercomputer Applications 11(2), 115–128 (1997)
Paranhos, D., Cirne, W., Brasileiro, F.: Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In: Proceedings of International Conference on Parallel and Distributed Computting (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Silva, F.A.B., Carvalho, S., Hruschka, E.R. (2004). A Scheduling Algorithm for Running Bag-of-Tasks Data Mining Applications on the Grid. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive