Skip to main content

A Heterogeneous Computing System for Data Mining Workflows

  • Conference paper
Flexible and Efficient Information Handling (BNCOD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4042))

Included in the following conference series:

Abstract

The computing-intensive Data Mining (DM) process calls for the support of a Heterogeneous Computing (HC) system, which consists of multiple computers with different configurations, connected by a high-speed LAN, for increased computational power and resources. DM process can be described as a multi-phase pipeline process, and in each phase there could be many optional methods. This makes the workflow of DM very complex and can be modelled only by a Directed Acyclic Graph (DAG). An HC system needs an effective and efficient scheduling framework, which orchestrates all the computing hardware to perform multiple competitive DM workflows. Motivated by the need of a practical solution of the scheduling problem for the DM workflow, this paper proposes a dynamic DAG scheduling algorithm according to the characteristics of execution time estimation model for DM jobs. Based on an approximate estimation of job execution time, this algorithm first maps DM jobs to machines in a decentralized and diligent (defined in this paper) manner. Then the performance of this initial mapping can be improved through job migrations when necessary. The scheduling heuristic used in it considers the factors of both the minimal completion time criterion and the critical path in a DAG. We implement this system in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems are used to test and measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luo, P., Lü, K., He, Q., Shi, Z.: A heterogeneous computing system for data mining workflows. Technical report, Institute of Computing Technology, Chinese Academy of Sciences (2006), http://www.intsci.ac.cn/users/luop/

  2. Fernandez-Baca, D.: Allocating modules to processors in a distributed system. IEEE Transaction on Software Engineering 15(11), 1427–1436 (1989)

    Article  Google Scholar 

  3. Iverson, M., Ozguner, F.: Dynamic, competitive scheduling of multiple dags in a distributed heterogeneous environment. In: Proceedings of the Eighth Heterogeneous Computing Workshop (1999)

    Google Scholar 

  4. Sakellariou, R., Zhao, H.: A hybrid heuristic for dag scheduling on heterogeneous systems. In: Poceedings of the 13th Heterogeneous Computing Workshop (2004)

    Google Scholar 

  5. Braun, T.D., Hensgen, D., Freund, R.F., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61(6), 810–837 (2001)

    Article  Google Scholar 

  6. Shi, Z., Zhang, H., Cheng, Y., Jiang, Y., Sheng, Q., Zhao, Z.: Mage: An agent-oriented programming environment. In: Proceedings of IEEE International Conference on Cognitive Informatics, pp. 250–257 (2004)

    Google Scholar 

  7. Talia, D., Trunfio, P., Verta, O.: Weka4ws: a wsrf-enabled weka toolkit for distributed data mining on grids. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS, vol. 3721, pp. 309–320. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Ali, A.S., Rana, O.F., Taylor, I.J.: Web services composition for distributed data mining. In: Proceedings of International Conference on Parallel Processing Workshops, pp. 11–18 (2005)

    Google Scholar 

  9. The Triana Problem Solving Environment, http://www.trianacode.org

  10. Cannataro, M., Talia, D.: Knowledge grid an architecture for distributed knowledge discovery. Communication of the ACM 46(1) (2003)

    Google Scholar 

  11. Cannataro, M., Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Distributed data mining on grids: Services, tools, and applications. IEEE Transactions on Systems, Man and Cybernetics 34(6), 2451–2465 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luo, P., Lü, K., He, Q., Shi, Z. (2006). A Heterogeneous Computing System for Data Mining Workflows. In: Bell, D.A., Hong, J. (eds) Flexible and Efficient Information Handling. BNCOD 2006. Lecture Notes in Computer Science, vol 4042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788911_15

Download citation

  • DOI: https://doi.org/10.1007/11788911_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35969-2

  • Online ISBN: 978-3-540-35971-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics