Skip to main content

HostoSink: A Collaborative Scheduling in Heterogeneous Environment

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8630))

Abstract

Due to the limitations of power consumption and memory capacity, the past few years have observed a strong trend of using heterogeneous environment equipped with accelerators, such as GPU (Graphic Processing Unit) and FPGA (Field Programmable Gate Array), and even MIC (Many Integrated Core), to help the traditional SMP (Symmetric Multi-Processing) CPU to speed up applications. In this paper, we choose the Intel MIC architecture coprocessor as the accelerator and design HostoSink, a runtime system for collaborative scheduling based on Pthread task. With the help of runtime characteristics of the application and the heterogeneous environment for scheduling the Pthread tasks between CPU and MIC automatically and dynamically, HostoSink provides MIC users with an easier way to gain high performance in heterogeneous CPU-MIC environment without the need of optimizing the original Pthread-based multi-threaded applications manually too much. Experimental results show that by using HostoSink, the overall speedup can achieve more than 3x speedup compared with the original performance by using CPU only and the average amount of data transmission between CPU and MIC is also reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. CUDA documents, http://developer.download.nvidia.com/compute/cuda/docs/CUDA_Architecture_Overview.pdf

  2. John, E.S., David, G., Shi, G.: OpenCL: A parallel programming standard for heterogeneous computing systems. IEEE Science & Engineering Magazine 12(3), 66–68 (2010)

    Google Scholar 

  3. Scanniello, G., Ugo, E., Giuseppe, C., Carmine, G.: Using the GPU to Green an Intensive and Massive Computation System. In: 17th IEEE European Conference on Software Maintenance and Reengineering (CSMR), pp. 384–387. IEEE Press (2013)

    Google Scholar 

  4. Xiao, S., Balaji, P., Dinan, J., Zhu, Q., Thakur, R., Coghlan, S., Lin, H., Wen, G., Hong, J., Feng, W.: Transparent Accelerator Migration in a Virtualized GPU Environment. In: 12th IEEE/ACM Symposimu on Cluster, Cloud and Grid Computing (CCGrid), pp. 124–131. IEEE Press (2012)

    Google Scholar 

  5. Alécio, P.D.B., Carlos, E.P., Arjan, K., Andre, S., Dieter, W.F.: An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 13th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 78–85. IEEE Press (2011)

    Google Scholar 

  6. Alexander, H., Michael, K., Bungartz, H.: From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture. IEEE Science & Engineering Magazine 14(2), 78–83 (2012)

    Google Scholar 

  7. Top500 supercomputer sites, http://www.top500.org/blog/lists/2013/11/press-release

  8. Jeffrey, S.V., Richard, G., Jack, D., Karsten, S., Bruce, L., Stephen, M., Jeremy, M.: Keeneland: Bringing heterogeneous gpu computing to the computational science community. IEEE Science & Engineering Magazine 13(5), 90–95 (2011)

    Google Scholar 

  9. Fan, K., Kudlur, M., Dasika, G., Mahlke, S.: Bridging the computation gap between programmable processors and hardwired accelerators. In: 15th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 313–322. IEEE Press (2009)

    Google Scholar 

  10. Givargis, T., Vahid, F.: Platune: A tuning framework for system-on-a-chip platforms. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (CADICS) 21(11), 1317–1327 (2002)

    Article  Google Scholar 

  11. Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., Asanović, K.: RAMP gold: An FPGA-based architecture simulator for multiprocessors. In: 47th ACM Design Automation Conference, pp. 463–468. ACM Press (2010)

    Google Scholar 

  12. Intel developers guide, http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html

  13. Diaz, J., Camelia, M., Alfonso, N.: A survey of parallel programming models and tools in the multi and many-core era. IEEE Transactions on Parallel and Distributed Systems (TPDS) 23(8), 1369–1386 (2012)

    Article  Google Scholar 

  14. Saule, E., Umit, V.C.: An early evaluation of the scalability of graph algorithms on the Intel MIC architecture. In: 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1629–1639. IEEE Press (2012)

    Google Scholar 

  15. Marjan, M., Jan, H., Anthony, M.S.: When and how to develop domain-specific languages. ACM Transactions on Computing Surveys (CSUR) 37(4), 316–344 (2005)

    Article  Google Scholar 

  16. Michael, D.L., Jamison, D.C., Wang, H., Meng, T.H.: Merge: A programming model for heterogeneous multi-core systems. ACM Transactions on SIGOPS Operating Systems Review 42(2), 287–296 (2008)

    Article  Google Scholar 

  17. Naila, F., Andrew, K., Gregory, D., Sudhakar, Y., Karsten, S.: A framework for dynamically instrumenting gpu compute applications within gpu ocelot. In: 4th ACM Workshop on General Purpose Processing on Graphics Processing Units, pp. 9–17. ACM Press (2011)

    Google Scholar 

  18. Arvind, S., Lee, H., Brown, K., Rompf, T., Chafi, H., Wu, M., Atreya, A., Odersky, M., Olukotun, K.: OptiML: An implicitly parallel domain-specific language for machine learning. In: 28th IMLS International Conference on Machine Learning (ICML), pp. 609–616. IEEE Press (2011)

    Google Scholar 

  19. Gelado, I., Stone, J.E., Cabezas, J., Patel, S., Navarro, N., Hwu, W.: An asymmetric distributed shared memory model for heterogeneous parallel systems. ACM Transactions on SIGARCH Computer Architecture News 38(1), 347–358 (2010)

    Google Scholar 

  20. Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. ACM Sigplan Notices 45(6), 86–97 (2010)

    Article  Google Scholar 

  21. Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M.: Understanding sources of inefficiency in general-purpose chips. In: 37th IEEE/ACM International Symposium on Computer Architecture (ISCA), pp. 37–47. IEEE Press (2010)

    Google Scholar 

  22. Qin, S., Geng, X., Jiang, Y.: Automatic Dynamic Task Distribution between CPU and GPU for VR Systems. Applied Mechanics and Materials 157, 1324–1330 (2012)

    Article  Google Scholar 

  23. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience (CCPE) 23(2), 187–198 (2011)

    Article  Google Scholar 

  24. Winter, J.A., Albonesi, D.H., Shoemaker, C.A.: Scalable thread scheduling and global power management for heterogeneous many-core architectures. In: 19th ACM International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 29–40. ACM Press (2010)

    Google Scholar 

  25. Song, H., Choi, K.: Autonomic Diffusive Load Balancing on Many-core Architecture using Simulated Annealing. In: 9th International Conference on Autonomic and Autonomous Systems (ICAS), pp. 90–95. IEEE Press (2013)

    Google Scholar 

  26. Bartzas, A., Bellasi, P., Anagnostopoulos, I., Silvano, C., Fornaciari, W., Soudris, D., Melpignano, D., Ykman-Couvreur, C.: Runtime Resource Management Techniques for Many-core Architectures: The 2PARMA Approach. In: The International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), pp. 835–840. IEEE Press (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Liao, X., Xiang, X., Jin, H., Zhang, W., Lu, F. (2014). HostoSink: A Collaborative Scheduling in Heterogeneous Environment. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8630. Springer, Cham. https://doi.org/10.1007/978-3-319-11197-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11197-1_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11196-4

  • Online ISBN: 978-3-319-11197-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics