An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing

Tanabe, Noboru; Hakozaki, Hirotaka; Ando, Hiroshi; Dohi, Yasunori; Luo, Zhengzhe; Nakajo, Hironori

doi:10.1007/s11227-009-0373-7

An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing

Published: 18 December 2009

Volume 51, pages 279–309, (2010)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Noboru Tanabe¹,
Hirotaka Hakozaki²,
Hiroshi Ando²,
Yasunori Dohi²,
Zhengzhe Luo³ &
…
Hironori Nakajo³

75 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

The performance of memory and I/O systems is insufficient to catch up with that of COTS (Commercial Off-The-Shelf) CPU. PC clusters using COTS CPU have been employed for HPC. A cache-based processor is far less effective than a vector processor in applications with low spatial locality. Moreover, for HPC, Google-like server farms and database processing, insufficient capacity of main memory poses a serious problem. Power consumption of a Google-like server farm or a high-end HPC PC cluster is huge. In order to overcome these problems, we propose a concept of a memory and network enhancer equipped with scatter and gather vector access functions, high-performance network connectivity, and capacity extensibility. Communication mechanisms named LHS and LHC are also proposed. LHS and LHC are architectures for reducing latency for mixed messages with small controlling data and large data body. Examples of the killer applications of this new type of hardware are presented. This paper presents not only concepts and simulations but also real hardware prototypes named DIMMnet-2 and DIMMnet-3. This paper presents the evaluations concerning memory issues and network issues. We evaluate the module with NAS CG benchmark class C and Wisconsin benchmarks as applications with memory issues. Although evaluation for CG class C is difficult with conventional cycle-accurate simulation methods, we obtained the result for class C with our original method. As a result, we find that the module can improve its maximum performance about 25 times more with Wisconsin benchmarks. However, the results on a cache-based PC show the cache-line flushing degrades acceleration ratio. This shows the high potential of the proposed extended memory module and processors in combination with DMA-based main memory access such as SPU on Cell/B.E. that does not need cache-line flushing. The LHS and LHC communication mechanisms are evaluated in this paper. The evaluations of their effects on latency are shown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

HugeMap: Optimizing Memory-Mapped I/O with Huge Pages for Fast Storage

Research of Configurable Hybrid Memory Architecture for Big Data Processing

References

Ahn JH, Erez M, Dally WJ (2005) Scatter-add in data parallel architectures. In: International symposium on high performance computer architecture (HPCA-11)
Austin T, Larson E, Emst D (2002) SimpleScalar: an infrastructure for computer system modeling. IEEE Comput 35(2):59–67
Google Scholar
Beecroft J, Addison D, Hewson D, McLaren M, Petrini F, Roweth D (2005) Quadrics QsNetII: pushing the limit of the design of high-performance networks for supercomputers. IEEE Micro 34–47
Brinkley Sprunt (2002) Brink and Abyss: Pentium 4 performance counter tools for Linux. http://www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm
Carter JB, Hsieh WC, Stoller LB, Swanson MR, Zhang L, Brunvand EL, Davis A, Kuo CC, Kuramkote R, Parker MA, Schaelicke L, Tateyama T (1999) Impulse: building a smarter memory controller. In: International symposium on high performance computer architecture (HPCA-5), pp 70–79
Cray Inc (2008) Cray X2^TM Vector Processing Blade. http://www.cray.com/Assets/PDF/products/xt/CrayX2Blade.pdf
Dally WJ, Hanrahan P, Erez M, Knight TJ, Labonte F, Ahn JH, Jayasena N, Kapasi UJ, Das A, Gummaraju J, Buck I (2003) Merrimac: supercomputing with streams. In: SC2003
Doerfler DW (2005) An analysis of the Pathscale Inc. infiniband host channel adapter, infinipath. Sandia Report SAND2005-5199
DRC Computer Corp (2007) Datasheet of DRC reconfigurable processor unit RPU110. http://drccomputer.com/pdfs/DRC_RPU110_fall07.pdf
Hagiwara T, Umezawa K (2003) Personal supercomputer SX-6i. IPSJ Mag 44(03) (in Japanese)
InfiniBand Trade Association (2009) http://www.infinibandta.org/
Kistler M, Perrone M, Petrini F (2006) Cell multiprocessor communication network: built for speed. IEEE Micro 26(3):10–23
Article Google Scholar
Kitamura A, Miyabe Y, Miyashiro T, Tanabe N, Nakajo H, Amano H (2007) Performance evaluation on low-latency communication mechanism of DIMMnet-2. In: IASTED international conference on parallel and distributed computing and networks (PDCN2007), pp 57–62
Lauritzen K, Sawicki T, Stachura T, Wilson CE (2005) Intel(R) I/O acceleration technology improves network performance, reliability and efficiently. Technology@Intel Magazine (Mar)
Mellanox Technologies (2009) http://www.mellanox.com/
Myricom (2009) http://www.myri.com/
NEC (2007) NEC SX-9: The world’s fastest vector processor providing new levels of performance. http://www.necam.com/SX/Collateral/nec_sx9_brochure.pdf
Tanabe N, Yamamoto J, Nishi H, Kudoh T, Hamada Y, Nakajo H, Amano H (2000) MEMOnet: network interface plugged into a memory slot. In: IEEE international conference on cluster computing (CLUSTER2000), pp 17–26
Tanabe N, Hakozaki H, Nakatake M, Dohi Y, Nakajo H, Amano H (2004) A new memory module for memory intensive applications. In: IEEE international conference on parallel computing in electrical engineering (ParElec2004), pp 123–128
Tanabe N, Nakatake M, Hakozaki H, Dohi Y, Nakajo H, Amano H (2004) A new memory module for COTS-based personal supercomputing. In: 7th International workshop on innovative architecture for future generation high-performance processors and systems (IWIA2004), pp 40–48
Tanabe N, Kitamura A, Miyashiro T, Miyabe Y, Araki T, Luo Z, Nakajo H, Amano H (2006) Hardware support for MPI in DIMMnet-2 network interface. In: 9th International workshop on innovative architecture for future generation high-performance processors and systems (IWIA2006), pp 73–80
Tanabe N, Nakajo H (2008) An enhancer of memory and network for cluster and its applications. In: 9th International conference on parallel and distributed computing, applications and technologies (PDCAT2008), pp 99–106
Tanabe N, Nakajo H (2008) Introduction to acceleration for MPI derived datatypes using an enhancer of memory and network. In: EuroPVM/MPI’08 poster presentation
Tanabe N, Sasaki M, Nakajo H, Takata M, Joe K (2009) The architecture of visualization system using memory with memory-side gathering and CPUs with DMA-type memory accessing. In: International conference on parallel and distributed processing techniques and applications (PDPTA2009), pp 427–433
Tanaka K, Fukawa T (2004) Highly functional memory architecture for large scale data application. In: 7th International workshop on innovative architecture for future generation high-performance processors and systems (IWIA2004), pp 109–118
TOSHIBA (2008) Toshiba Launches AV notebook PCs that integrate TOSHIBA Quad Core HD Processor SpursEngine^TM. http://www.toshiba.co.jp/about/press/200806/pr2301.htm
Yokokawa M, Habata S, Kawai S, Ito H, Tani K, Miyoshi H (1999) Basic design of the earth simulator. In: International symposium on high performance computing (ISHPC1999). LNCS, vol 1625. Springer, Berlin
Google Scholar

Download references

Author information

Authors and Affiliations

Toshiba R&D Center, Kawasaki, Kanagawa, Japan
Noboru Tanabe
Yokohama National University, Yokohama, Kanagawa, Japan
Hirotaka Hakozaki, Hiroshi Ando & Yasunori Dohi
Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan
Zhengzhe Luo & Hironori Nakajo

Authors

Noboru Tanabe
View author publications
You can also search for this author in PubMed Google Scholar
Hirotaka Hakozaki
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Ando
View author publications
You can also search for this author in PubMed Google Scholar
Yasunori Dohi
View author publications
You can also search for this author in PubMed Google Scholar
Zhengzhe Luo
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Nakajo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noboru Tanabe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tanabe, N., Hakozaki, H., Ando, H. et al. An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing. J Supercomput 51, 279–309 (2010). https://doi.org/10.1007/s11227-009-0373-7

Download citation

Published: 18 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11227-009-0373-7

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing

Abstract

Access this article

Similar content being viewed by others

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

HugeMap: Optimizing Memory-Mapped I/O with Huge Pages for Fast Storage

Research of Configurable Hybrid Memory Architecture for Big Data Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing

Abstract

Access this article

Similar content being viewed by others

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

HugeMap: Optimizing Memory-Mapped I/O with Huge Pages for Fast Storage

Research of Configurable Hybrid Memory Architecture for Big Data Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation