Efficient Communication Using Message Prediction for Cluster of Multiprocessors

Afsahi, Ahmad; Dimopoulos, Nikitas J.

doi:10.1007/10720115_12

Ahmad Afsahi⁶ &
Nikitas J. Dimopoulos⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1797))

Included in the following conference series:

International Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing

191 Accesses
5 Citations

Abstract

With the increasing uniprocessor and SMP computation power available, interprocessor communication has become an important factor that limits the performance of clusters of workstations. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems. A significant portion of the software communication overhead is attributed to message copying. Ideally, it is desirable to have a true zero-copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination. However, because the send side does not know the final receive buffer address, early arriving messages have to be buffered at a temporary area. In this work, we show that there is a message reception communication locality in message-passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be used to drain the network and cache the incoming messages even if the corresponding receive calls have not been posted yet. The performance of these predictors, in terms of hit ratio, on some parallel applications is quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Static Approximation of MPI Communication Graphs for Optimized Process Placement

Toward Structured Parallel Programming: Send-Receive Considered Harmful

Communication-Aware Hardware-Assisted MPI Overlap Engine

References

Afsahi, A., Dimopoulos, N.J.: Hiding Communication Latency in Reconfigurable Message-Passing Environments. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 55–60. Springer, Heidelberg (1999)
Google Scholar
Afsahi, A., Dimopoulos, N.J.: Communication Latency Hiding in Reconfigurable Message-Passing Environments: Quantitative Studies. In: 13th Annual International Symposium on High Performance Computing Systems and Applications, HPCS 1999, pp. 111–126 (June 1999)
Google Scholar
Afsahi, A., Dimopoulos, N.J.: Efficient Communication Using Message Prediction for Clusters of Multiprocessor, Technical Report ECE-99-5, Department of Electrical and Computer Engineering, University of Victoria (December 1999)
Google Scholar
Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: TreadMarks: Shared Memory Computing on Networks of Workstation. IEEE Computer 29(2), 18–28 (1996)
Google Scholar
Bailey, D.H., Harsis, T., Saphir, W., der Wijngaart, R.V., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0: Report NAS-95-020, Nasa Ames Research Center (December 1995)
Google Scholar
Banikazemi, M., Govindaraju, R.K., Blackmore, R., Panda, D.K.: Implementing Efficient MPI on LAPI for IBM RS/ 6000 SP Systems: Experiences and Performance Evaluation. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 183–190. Springer, Heidelberg (1999)
Chapter Google Scholar
Basu, A., Welsh, M., Eicken, T.V.: Incorporting Memory Management into User-Level Network Interfaces. Hot Interconnects V (August 1997)
Google Scholar
Blumrich, M., Li, K., Alpert, R., Dubnicki, C., Felten, E., Sandberg, J.: A Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 142–153 (1994)
Google Scholar
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.-K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro (February 1995)
Google Scholar
Chodnekar, S., Srinivasan, V., Vaidya, A., Sivasubramaniam, A., Das, C.: Towards a Communication Characterization Methodology for Parallel Applications. In: Proceedings of the Third International Symposium on High Performance Computer Architecture (1997)
Google Scholar
Chu, H.: Zero-copy TCP in Solaris. In: Proceedings of the USENIX Annual Technical Conference, pp. 253–263 (1996)
Google Scholar
Dahlgren, F., Dubois, M., Stenström, P.: Sequential Hardware Prefetching in Shared- Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems 6(7) (1995)
Google Scholar
Demaine, E.D.: A Threads-Only MPI Implementation for the Development of Parallel Programs. In: Proceedings of the 11th International Symposium on High Performance Computing Systems, HPCS7, pp. 153–163 (1997)
Google Scholar
Dao, B.V., Yalamanchili, S., Duato, J.: Architectural Support for Reducing Communication Overhead in Multiprocessor Interconnection Network. In: Proceedings of the Third International Symposium on High Performance Computer Architecture, pp. 343–352 (1997)
Google Scholar
Dongarra, J.J., Dunigan, T.: Message-Passing Performance of Various Computers. Concurrency: Practice and Experience 9(10), 915–926 (1997)
Article Google Scholar
Druschel, P., Peterson, L.L.: Fbufs: A High-bandwidth Cross-domain Transfer Facility. In: Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, pp. 189–202 (1993)
Google Scholar
Dubnicki, C., Bilas, A., Chen, Y., Damianakis, S., Li, K.: VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication. In: Proceedings of the Hot Interconnect 1997 (1997)
Google Scholar
Dunning, D., Regnier, G., McAlpine, G., Cameron, D., Shubert, B., Berry, F., Merritt, A.M., Gronke, E., Dodd, C.: The Virtual Interface Architecture. IEEE Micro, 66–76 (March-April 1998)
Google Scholar
Geoffray, P., Prylli, L., Tourancheau, B.: BIP-SMP: High Performance Message Passing Over a Cluster of Commodity SMPs. In: SC 1999: High Performance Networking and Computing Conference (November 1999)
Google Scholar
Gropp, W., Lusk, E.: User’s Guide for MPICH, a Portable Implementation of MPI. Argonne National Laboratory, Mathematics and Computer Science Division (June 1999)
Google Scholar
Horst, R.W., Garcia, D.: ServerNet SAN I/O Architecture. In: Proceedings of the Hot Interconnects V (1997)
Google Scholar
Karlson, S., Brorsson, M.: A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture (February 1998)
Google Scholar
Kaxiras, S., Goodman, J.R.: Improving CC-NUMA Performance Using Instruction- Based Prediction. In: International Symposium on High Performance Computer Architecture (1999)
Google Scholar
Kim, J., Lilja, D.J.: Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture, pp. 202–216 (February 1998)
Google Scholar
de Lahaut, D.G., Germain, C.: Static Communications in Parallel Scientific Programs. In: Halatsis, C., Philokyprou, G., Maritsas, D., Theodoridis, S. (eds.) PARLE 1994. LNCS, vol. 817. Springer, Heidelberg (1994)
Google Scholar
Lai, A.-C., Falsafi, B.: Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In: Proceedings of the 26th Annual International Symposium on Computer Architectures, pp. 172–183 (1999)
Google Scholar
Lauria, M., Pakin, S., Chien, A.A.: Efficient Layering for High Speed Communication: Fast Messages 2.x. In: Proceedings of the 7th High Performance Distributed Computing, HPDC7, Conference (1998)
Google Scholar
Lumetta, S.S., Mainwaring, A.M., Culler, D.E.: Multi-Protocol Active Messages on a Cluster of SMPs. In: SC 1997: High Performance Networking and Computing Conference (November 1997)
Google Scholar
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard. Version 1.1 (June 1995)
Google Scholar
Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface (July 1997)
Google Scholar
Mukherjee, S.S., Hill, M.D.: Using Prediction to Accelerate Coherence Protocols. In: Proceedings of the 25th Annual International Symposium on Computer Architecture (1998)
Google Scholar
Pakin, S., Lauria, M., Chien, A.: High Performance Messaging on Workstation: Illinois Fast Messages (FM) for Myrinet. In: Proceedings of the Supercomputing 1995 (November 1995)
Google Scholar
Prylli, L., Tourancheau, B.: BIP: A New Protocol Designed for High Performance Networking on Myrinet. In: Proceedings of the PC-NOW98: International Workshop on Personal Computer based Networks of Workstations, in conjunction with IPPS/SPDP 1998 (1998)
Google Scholar
Rodrigues, S.H., Anderson, T.E., Culler, D.E.: High-Performance Local Area Communication with Fast Sockets. In: USENIX 1997 Annual Technical Conference (January 1997)
Google Scholar
Sakr, M.F., Levitan, S.P., Chiarulli, D.M., Horne, B.G., Giles, C.L.: Predicting Multiprocessor Memory Access Patterns with Learning Models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 305–312 (1997)
Google Scholar
Shah, G., Nieplocha, J., Mirza, J., Kim, C., Harrison, R., Govindaraju, R.K., Gildea, K., DiNicola, P., Bender, C.: Performance and Experience with LAPI – a New High Performance Communication Library for the IBM RS/6000 SP. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)
Google Scholar
Takahashi, T., O’Carrol, F., Tezuka, H., Hori, A., Sumimoto, S., Harada, H., Ishikawa, Y., Beckman, P.H.: Implementation and Evaluation of MPI on an SMP Cluster. In: Proceedings of the PC-NOW 1999: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with PPS/SPDP 1999 (1999)
Google Scholar
Tanaka, Y., Matsuda, M., Ando, M., Kubota, K., Sato, M.: COMPaS: A Pentium Pro PCbased SMP Cluster and its Experience. In: Proceedings of the PC-NOW 1998: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with IPPS/SPDP 1998 (1998)
Google Scholar
Tezuka, H., O’Carroll, F., Hori, A., Ishikawa, Y.: Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)
Google Scholar
Eicken, T.V., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active Messages: A Mechanism for Integrated Communication and Computation. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 256–265 (May 1992)
Google Scholar
Eicken, T.V., Basu, A., Buch, V., Vogels, W.: U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995)
Google Scholar
Worley, P.H., Foster, I.T.: Parallel Spectral Transform Shallow Water Model: A Runtime-tunable parallel benchmark coder. In: Proceedings of the Scalable High Performance Computing Conference, pp. 207–214 (1994)
Google Scholar
Zhang, Z., Torrellas, J.: Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In: Proeedings of the 22nd Annual International Symposium on Computer Architectures, pp. 188–199 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, Victoria, B.C., V8W 3P6, Canada
Ahmad Afsahi & Nikitas J. Dimopoulos

Authors

Ahmad Afsahi
View author publications
You can also search for this author in PubMed Google Scholar
Nikitas J. Dimopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical and Computer Engineering, Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, PA 15213, Pittsburgh, USA
Babak Falsafi
Department of Computer Science and Engineering, The Ohio State University,
Mario Lauria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afsahi, A., Dimopoulos, N.J. (2000). Efficient Communication Using Message Prediction for Cluster of Multiprocessors. In: Falsafi, B., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 2000. Lecture Notes in Computer Science, vol 1797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720115_12

Download citation

DOI: https://doi.org/10.1007/10720115_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67879-3
Online ISBN: 978-3-540-44655-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics