Skip to main content

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing

  • Conference paper
  • First Online:
  • 3990 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9040))

Abstract

Heterogeneous clusters using accelerators are widely used for high-performance computing system. In such systems, the inter-node communication among accelerators becomes bottleneck due to the data transfer between the accelerator and the host.

To eliminate this overhead, we have been developing a novel communication system realizing direct communication among accelerators over computation nodes under the HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) project. Also we are investigating high-level parallel programming language, and several practical application programs on our concept, as well as studying the enhancement of TCA and developing system software stack in the CREST project.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. OpenACC. http://www.openacc-standard.org

  2. PGAS - Partitioned Global Address Space Languages. http://www.pgas.org

  3. QUDA - A Library for QCD on GPUs. http://lattice.github.io/quda/

  4. XcalableMP Specification Version 1.2, November 2012. http://www.xcalablemp.org/spec/xmp-spec-1.2.pdf

  5. Altera Corp.: Stratix IV Device Handbook. http://www.altera.co.jp/literature/lit-stratix-iv.jsp

  6. Amano, H., Kuhara, T., Kaneda, T., Hanawa, T., Kodama, Y., Boku, T.: A preliminarily evaluation of PEACH3: a switching hub for tightly coupled accelerators. In: Proc. of 2nd International Workshop on Computer Systems and Architectures (CSA 2014), in Conjunction with the 2nd International Symposium on Computing and Networking (CANDAR 2014), December 2014

    Google Scholar 

  7. Ammendola, R., et al.: APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters. Journal of Physics 331(Part 5) (2011)

    Google Scholar 

  8. Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)

    Article  MATH  Google Scholar 

  9. Dongarra, J., Meuer, H., Stromaier, E., Simon, H.: Top500 list. http://www.top500.org

  10. Feng, W.C., Cameron, K.W.: Green500 list. http://www.green500.org

  11. Fujita, N., Fujii, H., Hanawa, T., Kodama, Y., Boku, T., Kuramashi, Y., Clark, M.: QCD library for GPU cluster with proprietary interconnect for GPU direct communication. In: Lopes, L., et al. (eds.) Euro-Par 2014, Part I. LNCS, vol. 8805, pp. 251–262. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. Gudmundson, J.: Enabling multi-host system designs with PCI Express technology, May 2004. http://www.plxtech.com/products/expresslane/techinfo

  13. Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Interconnect for tightly coupled accelerators architecture. In: Proc. of IEEE 21st Annual Sympsium on High-Performance Interconnects (HOT Interconnects 21), pp. 79–82, August 2013

    Google Scholar 

  14. Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Tightly coupled accelerators architecture for minimizing communication latency among accelerators. In: The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES2013) in Conjunction with IPDPS, pp. 1030–1039, May 2013

    Google Scholar 

  15. Kodama, Y., Hanawa, T., Boku, T., Sato, M.: PEACH2: FPGA based PCIe network device for tightly coupled accelerators. ACM SIGARCH Computer Architecture News 42(4), 3–8 (2014)

    Article  Google Scholar 

  16. Mellanox Technologies: Mellanox OFED GPUDirect. http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=116&menu_section=34

  17. Nakao, M., Lee, J., Boku, T., Sato, M.: Productivity and performance of global-view programming with XcalableMP PGAS language. In: The 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 402–409, May 2012

    Google Scholar 

  18. Nakao, M., Murai, H., Shimosaka, T., Tabuchi, A., Hanawa, T., Kodama, Y., Boku, T., Sato, M.: XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters. In: Proc. of Workshop on a Accelerator Programming Using Directives (WACCPD 2014), in Conjunction with SC14, pp. 27–36, November 2014

    Google Scholar 

  19. NVIDIA Corp.: Developing A Linux Kernel Module Using RDMA For GPUDirect. http://developer.download.nvidia.com/compute/cuda/5_0/rc/docs/GPUDirect_RDMA.pdf

  20. NVIDIA Corp.: NVIDIA GPUDirect. http://developer.nvidia.com/gpudirect

  21. PCI-SIG: PCI Express Card Electromechanical (CEM) Specification, Rev. 2.0, April 2007

    Google Scholar 

  22. PCI-SIG: PCI Express Base Specification, Rev. 3.0, November 2010

    Google Scholar 

  23. Rossetti, D., et al.: Leveraging NVIDIA GPUDirect on APEnet+ 3D torus cluster interconnect, May 2012. http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0282-GTC2012-GPU-Torus-Cluster.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshihiro Hanawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hanawa, T. et al. (2015). Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing. In: Sano, K., Soudris, D., Hübner, M., Diniz, P. (eds) Applied Reconfigurable Computing. ARC 2015. Lecture Notes in Computer Science(), vol 9040. Springer, Cham. https://doi.org/10.1007/978-3-319-16214-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16214-0_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16213-3

  • Online ISBN: 978-3-319-16214-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics