Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments

Khunjush, Farshad; Dimopoulos, Nikitas J.

doi:10.1007/978-3-540-74742-0_21

Farshad Khunjush¹ &
Nikitas J. Dimopoulos¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4742))

Included in the following conference series:

International Symposium on Parallel and Distributed Processing and Applications

815 Accesses

Abstract

The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead, we introduce architectural extensions comprising a specialized network cache and instructions. In this work, we study the possible overhead and cache pollution introduced through the operating system and the communications stack as exemplified by Linux, TCP/IP and M-VIA. We introduce this overhead in our simulation environment and study its effects on our proposed extensions. Ultimately, we have been able to compare the performance achieved by an application running on a system incorporating our extensions with the performance of the same application running on a standard system. The results show that our proposed approach can improve the performance of MPI applications by 10% to 20%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Extending \(\tau \)-Lop to model MPI blocking primitives on shared memory

Article 25 February 2022

Packetization of Shared-Memory Traces for Message Passing Oriented NoC Simulation

An ns-3 MPTCP Implementation

References

Khunjush, F., Dimopoulos, N.J.: Lazy Direct-To-Cache Transfer during Receive Operations in a Message Passing Environment. In: Proceedings, the 3rd ACM International Conference on Computing Frontiers, CF 2006, pp. 331–340 (2006)
Google Scholar
Khunjush, F., Dimopoulos, N.J.: Evaluation of Direct-To-Cache Transfer during Receive Operations in a Message Passing Environment. In: Proceedings, the Second International Workshop on Advanced Networking and Communications Hardware, ANCHOR2005, in conjunction with ISCA-32, pp. 22–29 (2005)
Google Scholar
Khunjush, F., Dimopoulos, N.J.: Hiding Message Delivery and Reducing Memory Access Latency by providing Direct-to-Cache Transfer during Receive Operations in a Message Passing Environment. ACM SIGARCH Computer Architecture News 34(1), 41–48 (2006)
Article Google Scholar
Afsahi, A., Dimopoulos, N.J.: Architectural Extensions to Support Efficient Communication Using Message Prediction. In: Proceedings, HPCS2002, pp. 20–27 (2002)
Google Scholar
Dubunicki, S., et al.: The Virtual Interface Architecture. IEEE Micro, 66–76 (March-April 1998)
Google Scholar
Engblom, J., et al.: Developing Embedded Networked Products using the Simics Full-System Simulator. In: Proceedings PIMRC 2005 (2005)
Google Scholar
MPICH-A Portable Implementation of MPI: available at http://www-unix.mcs.anl.gov/mpi/mpich1/
MVICH: MPI for Virtual Interface Architecture, http://www.nersc.gov/research/FTG/mvich/index.html
Bailey, D., et al.: The NAS Parallel Benchmarks 2.0: Report NAS-95-020. Nasa Ames Research Center (1995)
Google Scholar
Worley, P., Foster, I.: Parallel Spectral Transform Shallow Water Model: A Runtime-tunable parallel benchmark code. In: Proceedings of the Scalable High Performance Computing Conference, pp. 207–214 (1994)
Google Scholar
Austin, T., et al.: SimpleScalar: an infrastructure for computer system modeling. IEEE Computer 35(2), 59–67 (2002)
Google Scholar
Boden, N., et al.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro (1995)
Google Scholar
InfiniBand Trade Association: InfiniBand Architecture Specification, http://www.infinibandta.org
Dubnicki, C., et al.: VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication. In: Proceedings of the Hot Interconnect 1997 (1997)
Google Scholar
Rodrigues, S., et al.: High-Performance Local Area Communication with Fast Sockets. In: USENIX 1997 (1997)
Google Scholar
Basu, A., Welsh, M., Eicken, T.V.: Incorporating Memory Management into User-Level Network Interface. Hot Interconnects V (1997)
Google Scholar
Banikazemi, M., et al.: MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems. IEEE Trans. Parallel Distri. Systems 12(10), 1081–1093 (2001)
Article Google Scholar
Chu, H.: Zero-copy TCP in Solaris. In: Proceedings of the USENIX Annual Technical Conference, pp. 253–263 (1996)
Google Scholar
Alacritech, Inc.: Allacritech / SLIC technology overview, http://www.alacritech.com/html/tech_review.html
Binkert, N.L., et al.: Performance Analysis of System Overheads in TCP/IP Workloads. In: Malyshkin, V. (ed.) PaCT 2005. LNCS, vol. 3606, Springer, Heidelberg (2005)
Google Scholar
Huggahalli, R., Iyer, R., Tetrick, S.: Direct Cache Access for High Bandwidth Network I/O. In: Proceedings, ISCA-32, pp. 50–59 (2005)
Google Scholar
Lauritzen, K., et al.: Intel I/O acceleration technology improves network performance, reliability and efficiently. Technology@Intel magazine (2005), http://www.intel.com/technology/magazine/communications/Intel-IOAT-0305.pdf
RDMA Consortium: http://www.rdmaconsortium.org/
Acacio, M.E., et al.: Owner Prediction for Accelerating Cache-to-Cache Transfers in a cc-NUMA Architecture. In: Proceedings, SC 2002 (2002)
Google Scholar
Kim, J., Lilja, D.J.: Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, HPCA-4, pp. 202–216 (1998)
Google Scholar
Afsahi, A., Dimopoulos, N.J.: Efficient Communication Using Message Prediction for Cluster of Multiprocessors. In: Falsafi, B. (ed.) CANPC 2000. LNCS, vol. 1797, pp. 162–178. Springer, Heidelberg (2000)
Chapter Google Scholar
M-VIA: Virtual Interface Architecture for Linux (2001), Was available at http://www.nserc.gov/research/FTG/via/
Bryant, R.E., O’Hallaron, D.R.: Computer Systems: A Programmer’s Perspective. Prentice-Hall, Englewood Cliffs (2003)
Google Scholar
Cappelo, F., Etiemble, D.: MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks. In: Reich, S., Anderson, K.M. (eds.) Open Hypermedia Systems and Structural Computing. LNCS, vol. 1903, Springer, Heidelberg (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Victoria, Victoria, B.C., Canada
Farshad Khunjush & Nikitas J. Dimopoulos

Authors

Farshad Khunjush
View author publications
You can also search for this author in PubMed Google Scholar
Nikitas J. Dimopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ivan Stojmenovic Ruppa K. Thulasiram Laurence T. Yang Weijia Jia Minyi Guo Rodrigo Fernandes de Mello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khunjush, F., Dimopoulos, N.J. (2007). Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-74742-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74741-3
Online ISBN: 978-3-540-74742-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics