Skip to main content

Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7997))

Abstract

The emergence of new hybrid and heterogenous multi-GPUs multi-CPUs large scale platforms offers new opportunities and poses new challenges when solving difficult optimization problems. This paper targets irregular tree search algorithms in which workload is unpredictable. We propose an adaptive distributed approach allowing to distribute the load dynamically at runtime while taking into account the computing abilities of either GPUs or CPUs. Using Branch-and-Bound and FlowShop as a case study, we deployed our approach using up to \(20\) GPUs and \(128\) CPUs. Through extensive experiments in different system configurations, we report near optimal speedups, thus providing new insights into how to take full advantage of both GPUs and CPUs power in modern computing platforms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46, 720–748 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Boukedjar, A., Lalami, M.E., El-Baz, D.: Parallel branch and bound on a CPU-GPU system. In: 20th International Conference on Parallel, Distributed and Network-Based Processing, pp. 392–398 (2012)

    Google Scholar 

  3. Carneiro, T., Muritiba, A.E., Negreiros, M., De Campos, L., Augusto, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd Symposium on Computer Architecture and High Performance Computing, pp. 41–47 (2011)

    Google Scholar 

  4. Chakroun, I., Melab, M.: An adaptative multi-GPU based branch-and-bound. a case study: the flow-shop scheduling problem. In: 14th IEEE Interernational Conference on High Performance Computing and Communications (2012)

    Google Scholar 

  5. Dijkstra, E.W.: Derivation of a termination detection algorithm for distributed computations. In: Broy, M. (ed.) Control Flow and Data Flow: Concepts of Distributed Programming, pp. 507–512. Springer, Berlin (1987)

    Google Scholar 

  6. Dinan, J., Olivier, S., Sabin, G., Prins, J., Sadayappan, P., Tseng, C.-W.: A message passing benchmark for unbalanced applications. Simul. Model. Pract. Theor. 16(9), 1177–1189 (2008)

    Article  Google Scholar 

  7. Matteo, F., Charles, E.L., Keith, H.R.: The implementation of the cilk-5 multithreaded language. SIGPLAN Not. 33, 212–223 (1998)

    Article  Google Scholar 

  8. Grid500 French national gird. https://www.grid5000.fr/

  9. James, D., Brian, L.D., Sadayappan, P., Krishnamoorthy, S., Jarek, N.: Scalable work stealing. In: Proceedings of ACM Conference on High Performance Computing Networking, Storage and Analysis, pp. 53:1–53:11 (2009)

    Google Scholar 

  10. Lalami, M.E., El-Baz, D.: GPU implementation of the branch and bound method for knapsack problems. In: IPDPS Workshops, pp. 1769–1777 (2012)

    Google Scholar 

  11. Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D.: A GPU-accelerated b &b algorithm for the flow-shop scheduling problem. In: 14th IEEE Conference on Cluster Computing (2012)

    Google Scholar 

  12. Min, S.-J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: Proceedings of 5th Conference on Partitioned Global Address Space Programming Models (2011)

    Google Scholar 

  13. Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based global load balancing. In: 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP ’11), pp. 201–212 (2011)

    Google Scholar 

  14. Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This material is based on work supported by INRIA HEMERA project. Experiments presented in this paper were carried out using the Grid5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). Thanks also to Imen Chakroun for her precious contributions to the code development of the GPU kernel.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trong-Tuan Vu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vu, TT., Derbel, B., Melab, N. (2013). Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44973-4_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44972-7

  • Online ISBN: 978-3-642-44973-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics