Abstract
Researchers in the field of data mining now confront a common problem that data mining tasks are time-consuming in that these tasks have to process large-scale datasets. Grid computing focuses on integrating distributed, heterogeneous and idle computers from the Internet to be a service system with high performance. Thus, it is possible to take advantage of grid computing to provide high performance computation capability to effectively reduce task durations. Here, we have successfully developed DMGrid, a grid handling data mining applications. In DMGrid, it not only considers efficient parallel computing as a crucial aspect, but also takes into account dynamic resource configuration. Unlike many existing data mining grids, DMGrid also provides an engine to execute the algorithm flow specified in an application. Moreover, it offers application execution monitoring. At last, we perform experiments and design two applications: Customer Churning Analysis and Customer Value Analysis through which the feasibility of DMGrid is validated.
This work is supported by the National Natural Science Foundation of China under Grant 60402011 and National Eleven Five-Year Scientific and Technical Support Plans under Grant 2006BAH03B05.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn.
Chattratichat, J., Darlington, J., Guo, Y., Hedvall, S., Köler, M., Syed, J.: An architecture for distributed enterprise data mining. In: HPCN Europe 1999: Proceedings of the 7th International Conference on High-Performance Computing and Networking, pp. 573–582 (1999)
Hinke, T.H., Novotny, J.: Data mining on nasa’s information power grid. hpdc 292, 292 (2000)
Cannataro, M., Talia, D.: The knowledge grid. Communications of the ACM 46, 89–93 (2003)
Cannataro, M., Pugliese, A., Talia, D., Trunfil, P.: Distributed data mining on grids: Service, tools, and applications. IEEE transactions on system, man, and cybernetics-part B:cybernetic 34(6), 2451–2465 (2004)
Jiang, W., Yu, J.: Distributed data mining on the grid. Proceedings of 2005 International Conference on Machine Learning and Cybernetics 4, 2010–2014 (2005)
Chen, P., Wang, B., Xu, L., Wu, B., Zhou, G.: The design of data mining web service architecture based on jdm in grid environment. In: International Symposium on Pervasive Computing and Applications, pp. 684–689 (2006)
Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.M.: Gridminer: A framework for knowledge discovery on the grid – from a vision to design and implementation. In: Cracow Grid Workshop (2004)
Pérez, M.S., Sánchez, A., Robles, V., Herrero, P., Peńa, J.M.: Design and implementation of a data mining grid-aware architecture. Future Generation Computer Systems 23, 42–47 (2007)
Alessandro, D., Amihai, M.: Virtue a formal model of virtual enterprises for information markets. J. Intell. Inf. Syst. 30(1), 33–53 (2008)
Ramos, R., Camacho, R., Souto, P.: A commodity platform for distributed data mining – the harvard system. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 49–61. Springer, Heidelberg (2006)
Zheng, Y.E., Ma, H., Zhang, L.: A temporal logic based grid workflow model and scheduling scheme. In: Proceedings of the sixth International Conference on Grid and Cooperative Computing, pp. 338–345 (2007)
Zhang, L., Ma, H., Jiang, Y., Zheng, Y.E.: Gmpi: A grid based mpi framework and its implementation. Journal of Huazhong University of Science and Technology (Nature Science) 35 (sup. II), 16–19 (2007)
Du, N., Wu, B., Wang, B.: A parallel algorithm for enumerating all maximal cliques in complex network. In: Proceedings of the 6th International Conference on Data Mining Workshop, pp. 320–324 (2006)
Chen, P., Wang, Y., Wu, B.: Betweenness research in telecom society network. The Journal of Dynamics of Continuous, Discrete and Impulsive Systems (DCDIS)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Xu, L., Geng, G., Zhao, X., Du, N. (2008). DMGrid: A Data Mining System Based on Grid Computing. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)