Skip to main content

DMGrid: A Data Mining System Based on Grid Computing

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

  • 2481 Accesses

Abstract

Researchers in the field of data mining now confront a common problem that data mining tasks are time-consuming in that these tasks have to process large-scale datasets. Grid computing focuses on integrating distributed, heterogeneous and idle computers from the Internet to be a service system with high performance. Thus, it is possible to take advantage of grid computing to provide high performance computation capability to effectively reduce task durations. Here, we have successfully developed DMGrid, a grid handling data mining applications. In DMGrid, it not only considers efficient parallel computing as a crucial aspect, but also takes into account dynamic resource configuration. Unlike many existing data mining grids, DMGrid also provides an engine to execute the algorithm flow specified in an application. Moreover, it offers application execution monitoring. At last, we perform experiments and design two applications: Customer Churning Analysis and Customer Value Analysis through which the feasibility of DMGrid is validated.

This work is supported by the National Natural Science Foundation of China under Grant 60402011 and National Eleven Five-Year Scientific and Technical Support Plans under Grant 2006BAH03B05.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn.

    Google Scholar 

  2. Chattratichat, J., Darlington, J., Guo, Y., Hedvall, S., Köler, M., Syed, J.: An architecture for distributed enterprise data mining. In: HPCN Europe 1999: Proceedings of the 7th International Conference on High-Performance Computing and Networking, pp. 573–582 (1999)

    Google Scholar 

  3. Hinke, T.H., Novotny, J.: Data mining on nasa’s information power grid. hpdc 292, 292 (2000)

    Google Scholar 

  4. Cannataro, M., Talia, D.: The knowledge grid. Communications of the ACM 46, 89–93 (2003)

    Article  Google Scholar 

  5. Cannataro, M., Pugliese, A., Talia, D., Trunfil, P.: Distributed data mining on grids: Service, tools, and applications. IEEE transactions on system, man, and cybernetics-part B:cybernetic 34(6), 2451–2465 (2004)

    Article  Google Scholar 

  6. Jiang, W., Yu, J.: Distributed data mining on the grid. Proceedings of 2005 International Conference on Machine Learning and Cybernetics 4, 2010–2014 (2005)

    Article  Google Scholar 

  7. Chen, P., Wang, B., Xu, L., Wu, B., Zhou, G.: The design of data mining web service architecture based on jdm in grid environment. In: International Symposium on Pervasive Computing and Applications, pp. 684–689 (2006)

    Google Scholar 

  8. Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.M.: Gridminer: A framework for knowledge discovery on the grid – from a vision to design and implementation. In: Cracow Grid Workshop (2004)

    Google Scholar 

  9. Pérez, M.S., Sánchez, A., Robles, V., Herrero, P., Peńa, J.M.: Design and implementation of a data mining grid-aware architecture. Future Generation Computer Systems 23, 42–47 (2007)

    Article  Google Scholar 

  10. Alessandro, D., Amihai, M.: Virtue a formal model of virtual enterprises for information markets. J. Intell. Inf. Syst. 30(1), 33–53 (2008)

    Article  Google Scholar 

  11. Ramos, R., Camacho, R., Souto, P.: A commodity platform for distributed data mining – the harvard system. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 49–61. Springer, Heidelberg (2006)

    Google Scholar 

  12. http://cishell.org

  13. http://nwb.slis.indiana.edu

  14. Zheng, Y.E., Ma, H., Zhang, L.: A temporal logic based grid workflow model and scheduling scheme. In: Proceedings of the sixth International Conference on Grid and Cooperative Computing, pp. 338–345 (2007)

    Google Scholar 

  15. Zhang, L., Ma, H., Jiang, Y., Zheng, Y.E.: Gmpi: A grid based mpi framework and its implementation. Journal of Huazhong University of Science and Technology (Nature Science) 35 (sup. II), 16–19 (2007)

    MATH  Google Scholar 

  16. Du, N., Wu, B., Wang, B.: A parallel algorithm for enumerating all maximal cliques in complex network. In: Proceedings of the 6th International Conference on Data Mining Workshop, pp. 320–324 (2006)

    Google Scholar 

  17. Chen, P., Wang, Y., Wu, B.: Betweenness research in telecom society network. The Journal of Dynamics of Continuous, Discrete and Impulsive Systems (DCDIS)

    Google Scholar 

  18. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Y., Xu, L., Geng, G., Zhao, X., Du, N. (2008). DMGrid: A Data Mining System Based on Grid Computing. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics