Abstract
With the increasing uniprocessor and SMP computation power available, interprocessor communication has become an important factor that limits the performance of clusters of workstations. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems. A significant portion of the software communication overhead is attributed to message copying. Ideally, it is desirable to have a true zero-copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination. However, because the send side does not know the final receive buffer address, early arriving messages have to be buffered at a temporary area. In this work, we show that there is a message reception communication locality in message-passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be used to drain the network and cache the incoming messages even if the corresponding receive calls have not been posted yet. The performance of these predictors, in terms of hit ratio, on some parallel applications is quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afsahi, A., Dimopoulos, N.J.: Hiding Communication Latency in Reconfigurable Message-Passing Environments. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 55–60. Springer, Heidelberg (1999)
Afsahi, A., Dimopoulos, N.J.: Communication Latency Hiding in Reconfigurable Message-Passing Environments: Quantitative Studies. In: 13th Annual International Symposium on High Performance Computing Systems and Applications, HPCS 1999, pp. 111–126 (June 1999)
Afsahi, A., Dimopoulos, N.J.: Efficient Communication Using Message Prediction for Clusters of Multiprocessor, Technical Report ECE-99-5, Department of Electrical and Computer Engineering, University of Victoria (December 1999)
Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: TreadMarks: Shared Memory Computing on Networks of Workstation. IEEE Computer 29(2), 18–28 (1996)
Bailey, D.H., Harsis, T., Saphir, W., der Wijngaart, R.V., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0: Report NAS-95-020, Nasa Ames Research Center (December 1995)
Banikazemi, M., Govindaraju, R.K., Blackmore, R., Panda, D.K.: Implementing Efficient MPI on LAPI for IBM RS/ 6000 SP Systems: Experiences and Performance Evaluation. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 183–190. Springer, Heidelberg (1999)
Basu, A., Welsh, M., Eicken, T.V.: Incorporting Memory Management into User-Level Network Interfaces. Hot Interconnects V (August 1997)
Blumrich, M., Li, K., Alpert, R., Dubnicki, C., Felten, E., Sandberg, J.: A Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 142–153 (1994)
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.-K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro (February 1995)
Chodnekar, S., Srinivasan, V., Vaidya, A., Sivasubramaniam, A., Das, C.: Towards a Communication Characterization Methodology for Parallel Applications. In: Proceedings of the Third International Symposium on High Performance Computer Architecture (1997)
Chu, H.: Zero-copy TCP in Solaris. In: Proceedings of the USENIX Annual Technical Conference, pp. 253–263 (1996)
Dahlgren, F., Dubois, M., Stenström, P.: Sequential Hardware Prefetching in Shared- Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems 6(7) (1995)
Demaine, E.D.: A Threads-Only MPI Implementation for the Development of Parallel Programs. In: Proceedings of the 11th International Symposium on High Performance Computing Systems, HPCS7, pp. 153–163 (1997)
Dao, B.V., Yalamanchili, S., Duato, J.: Architectural Support for Reducing Communication Overhead in Multiprocessor Interconnection Network. In: Proceedings of the Third International Symposium on High Performance Computer Architecture, pp. 343–352 (1997)
Dongarra, J.J., Dunigan, T.: Message-Passing Performance of Various Computers. Concurrency: Practice and Experience 9(10), 915–926 (1997)
Druschel, P., Peterson, L.L.: Fbufs: A High-bandwidth Cross-domain Transfer Facility. In: Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, pp. 189–202 (1993)
Dubnicki, C., Bilas, A., Chen, Y., Damianakis, S., Li, K.: VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication. In: Proceedings of the Hot Interconnect 1997 (1997)
Dunning, D., Regnier, G., McAlpine, G., Cameron, D., Shubert, B., Berry, F., Merritt, A.M., Gronke, E., Dodd, C.: The Virtual Interface Architecture. IEEE Micro, 66–76 (March-April 1998)
Geoffray, P., Prylli, L., Tourancheau, B.: BIP-SMP: High Performance Message Passing Over a Cluster of Commodity SMPs. In: SC 1999: High Performance Networking and Computing Conference (November 1999)
Gropp, W., Lusk, E.: User’s Guide for MPICH, a Portable Implementation of MPI. Argonne National Laboratory, Mathematics and Computer Science Division (June 1999)
Horst, R.W., Garcia, D.: ServerNet SAN I/O Architecture. In: Proceedings of the Hot Interconnects V (1997)
Karlson, S., Brorsson, M.: A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture (February 1998)
Kaxiras, S., Goodman, J.R.: Improving CC-NUMA Performance Using Instruction- Based Prediction. In: International Symposium on High Performance Computer Architecture (1999)
Kim, J., Lilja, D.J.: Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture, pp. 202–216 (February 1998)
de Lahaut, D.G., Germain, C.: Static Communications in Parallel Scientific Programs. In: Halatsis, C., Philokyprou, G., Maritsas, D., Theodoridis, S. (eds.) PARLE 1994. LNCS, vol. 817. Springer, Heidelberg (1994)
Lai, A.-C., Falsafi, B.: Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In: Proceedings of the 26th Annual International Symposium on Computer Architectures, pp. 172–183 (1999)
Lauria, M., Pakin, S., Chien, A.A.: Efficient Layering for High Speed Communication: Fast Messages 2.x. In: Proceedings of the 7th High Performance Distributed Computing, HPDC7, Conference (1998)
Lumetta, S.S., Mainwaring, A.M., Culler, D.E.: Multi-Protocol Active Messages on a Cluster of SMPs. In: SC 1997: High Performance Networking and Computing Conference (November 1997)
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard. Version 1.1 (June 1995)
Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface (July 1997)
Mukherjee, S.S., Hill, M.D.: Using Prediction to Accelerate Coherence Protocols. In: Proceedings of the 25th Annual International Symposium on Computer Architecture (1998)
Pakin, S., Lauria, M., Chien, A.: High Performance Messaging on Workstation: Illinois Fast Messages (FM) for Myrinet. In: Proceedings of the Supercomputing 1995 (November 1995)
Prylli, L., Tourancheau, B.: BIP: A New Protocol Designed for High Performance Networking on Myrinet. In: Proceedings of the PC-NOW98: International Workshop on Personal Computer based Networks of Workstations, in conjunction with IPPS/SPDP 1998 (1998)
Rodrigues, S.H., Anderson, T.E., Culler, D.E.: High-Performance Local Area Communication with Fast Sockets. In: USENIX 1997 Annual Technical Conference (January 1997)
Sakr, M.F., Levitan, S.P., Chiarulli, D.M., Horne, B.G., Giles, C.L.: Predicting Multiprocessor Memory Access Patterns with Learning Models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 305–312 (1997)
Shah, G., Nieplocha, J., Mirza, J., Kim, C., Harrison, R., Govindaraju, R.K., Gildea, K., DiNicola, P., Bender, C.: Performance and Experience with LAPI – a New High Performance Communication Library for the IBM RS/6000 SP. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)
Takahashi, T., O’Carrol, F., Tezuka, H., Hori, A., Sumimoto, S., Harada, H., Ishikawa, Y., Beckman, P.H.: Implementation and Evaluation of MPI on an SMP Cluster. In: Proceedings of the PC-NOW 1999: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with PPS/SPDP 1999 (1999)
Tanaka, Y., Matsuda, M., Ando, M., Kubota, K., Sato, M.: COMPaS: A Pentium Pro PCbased SMP Cluster and its Experience. In: Proceedings of the PC-NOW 1998: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with IPPS/SPDP 1998 (1998)
Tezuka, H., O’Carroll, F., Hori, A., Ishikawa, Y.: Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)
Eicken, T.V., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active Messages: A Mechanism for Integrated Communication and Computation. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 256–265 (May 1992)
Eicken, T.V., Basu, A., Buch, V., Vogels, W.: U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995)
Worley, P.H., Foster, I.T.: Parallel Spectral Transform Shallow Water Model: A Runtime-tunable parallel benchmark coder. In: Proceedings of the Scalable High Performance Computing Conference, pp. 207–214 (1994)
Zhang, Z., Torrellas, J.: Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In: Proeedings of the 22nd Annual International Symposium on Computer Architectures, pp. 188–199 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Afsahi, A., Dimopoulos, N.J. (2000). Efficient Communication Using Message Prediction for Cluster of Multiprocessors. In: Falsafi, B., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 2000. Lecture Notes in Computer Science, vol 1797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720115_12
Download citation
DOI: https://doi.org/10.1007/10720115_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67879-3
Online ISBN: 978-3-540-44655-2
eBook Packages: Springer Book Archive