Skip to main content

Efficient Communication Using Message Prediction for Cluster of Multiprocessors

  • Conference paper
Network-Based Parallel Computing. Communication, Architecture, and Applications (CANPC 2000)

Abstract

With the increasing uniprocessor and SMP computation power available, interprocessor communication has become an important factor that limits the performance of clusters of workstations. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems. A significant portion of the software communication overhead is attributed to message copying. Ideally, it is desirable to have a true zero-copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination. However, because the send side does not know the final receive buffer address, early arriving messages have to be buffered at a temporary area. In this work, we show that there is a message reception communication locality in message-passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be used to drain the network and cache the incoming messages even if the corresponding receive calls have not been posted yet. The performance of these predictors, in terms of hit ratio, on some parallel applications is quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Afsahi, A., Dimopoulos, N.J.: Hiding Communication Latency in Reconfigurable Message-Passing Environments. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 55–60. Springer, Heidelberg (1999)

    Google Scholar 

  2. Afsahi, A., Dimopoulos, N.J.: Communication Latency Hiding in Reconfigurable Message-Passing Environments: Quantitative Studies. In: 13th Annual International Symposium on High Performance Computing Systems and Applications, HPCS 1999, pp. 111–126 (June 1999)

    Google Scholar 

  3. Afsahi, A., Dimopoulos, N.J.: Efficient Communication Using Message Prediction for Clusters of Multiprocessor, Technical Report ECE-99-5, Department of Electrical and Computer Engineering, University of Victoria (December 1999)

    Google Scholar 

  4. Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: TreadMarks: Shared Memory Computing on Networks of Workstation. IEEE Computer 29(2), 18–28 (1996)

    Google Scholar 

  5. Bailey, D.H., Harsis, T., Saphir, W., der Wijngaart, R.V., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0: Report NAS-95-020, Nasa Ames Research Center (December 1995)

    Google Scholar 

  6. Banikazemi, M., Govindaraju, R.K., Blackmore, R., Panda, D.K.: Implementing Efficient MPI on LAPI for IBM RS/ 6000 SP Systems: Experiences and Performance Evaluation. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 183–190. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  7. Basu, A., Welsh, M., Eicken, T.V.: Incorporting Memory Management into User-Level Network Interfaces. Hot Interconnects V (August 1997)

    Google Scholar 

  8. Blumrich, M., Li, K., Alpert, R., Dubnicki, C., Felten, E., Sandberg, J.: A Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 142–153 (1994)

    Google Scholar 

  9. Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.-K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro (February 1995)

    Google Scholar 

  10. Chodnekar, S., Srinivasan, V., Vaidya, A., Sivasubramaniam, A., Das, C.: Towards a Communication Characterization Methodology for Parallel Applications. In: Proceedings of the Third International Symposium on High Performance Computer Architecture (1997)

    Google Scholar 

  11. Chu, H.: Zero-copy TCP in Solaris. In: Proceedings of the USENIX Annual Technical Conference, pp. 253–263 (1996)

    Google Scholar 

  12. Dahlgren, F., Dubois, M., Stenström, P.: Sequential Hardware Prefetching in Shared- Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems 6(7) (1995)

    Google Scholar 

  13. Demaine, E.D.: A Threads-Only MPI Implementation for the Development of Parallel Programs. In: Proceedings of the 11th International Symposium on High Performance Computing Systems, HPCS7, pp. 153–163 (1997)

    Google Scholar 

  14. Dao, B.V., Yalamanchili, S., Duato, J.: Architectural Support for Reducing Communication Overhead in Multiprocessor Interconnection Network. In: Proceedings of the Third International Symposium on High Performance Computer Architecture, pp. 343–352 (1997)

    Google Scholar 

  15. Dongarra, J.J., Dunigan, T.: Message-Passing Performance of Various Computers. Concurrency: Practice and Experience 9(10), 915–926 (1997)

    Article  Google Scholar 

  16. Druschel, P., Peterson, L.L.: Fbufs: A High-bandwidth Cross-domain Transfer Facility. In: Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, pp. 189–202 (1993)

    Google Scholar 

  17. Dubnicki, C., Bilas, A., Chen, Y., Damianakis, S., Li, K.: VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication. In: Proceedings of the Hot Interconnect 1997 (1997)

    Google Scholar 

  18. Dunning, D., Regnier, G., McAlpine, G., Cameron, D., Shubert, B., Berry, F., Merritt, A.M., Gronke, E., Dodd, C.: The Virtual Interface Architecture. IEEE Micro, 66–76 (March-April 1998)

    Google Scholar 

  19. Geoffray, P., Prylli, L., Tourancheau, B.: BIP-SMP: High Performance Message Passing Over a Cluster of Commodity SMPs. In: SC 1999: High Performance Networking and Computing Conference (November 1999)

    Google Scholar 

  20. Gropp, W., Lusk, E.: User’s Guide for MPICH, a Portable Implementation of MPI. Argonne National Laboratory, Mathematics and Computer Science Division (June 1999)

    Google Scholar 

  21. Horst, R.W., Garcia, D.: ServerNet SAN I/O Architecture. In: Proceedings of the Hot Interconnects V (1997)

    Google Scholar 

  22. Karlson, S., Brorsson, M.: A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture (February 1998)

    Google Scholar 

  23. Kaxiras, S., Goodman, J.R.: Improving CC-NUMA Performance Using Instruction- Based Prediction. In: International Symposium on High Performance Computer Architecture (1999)

    Google Scholar 

  24. Kim, J., Lilja, D.J.: Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs. In: Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, International Symposium on High Performance Computer Architecture, pp. 202–216 (February 1998)

    Google Scholar 

  25. de Lahaut, D.G., Germain, C.: Static Communications in Parallel Scientific Programs. In: Halatsis, C., Philokyprou, G., Maritsas, D., Theodoridis, S. (eds.) PARLE 1994. LNCS, vol. 817. Springer, Heidelberg (1994)

    Google Scholar 

  26. Lai, A.-C., Falsafi, B.: Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In: Proceedings of the 26th Annual International Symposium on Computer Architectures, pp. 172–183 (1999)

    Google Scholar 

  27. Lauria, M., Pakin, S., Chien, A.A.: Efficient Layering for High Speed Communication: Fast Messages 2.x. In: Proceedings of the 7th High Performance Distributed Computing, HPDC7, Conference (1998)

    Google Scholar 

  28. Lumetta, S.S., Mainwaring, A.M., Culler, D.E.: Multi-Protocol Active Messages on a Cluster of SMPs. In: SC 1997: High Performance Networking and Computing Conference (November 1997)

    Google Scholar 

  29. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard. Version 1.1 (June 1995)

    Google Scholar 

  30. Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface (July 1997)

    Google Scholar 

  31. Mukherjee, S.S., Hill, M.D.: Using Prediction to Accelerate Coherence Protocols. In: Proceedings of the 25th Annual International Symposium on Computer Architecture (1998)

    Google Scholar 

  32. Pakin, S., Lauria, M., Chien, A.: High Performance Messaging on Workstation: Illinois Fast Messages (FM) for Myrinet. In: Proceedings of the Supercomputing 1995 (November 1995)

    Google Scholar 

  33. Prylli, L., Tourancheau, B.: BIP: A New Protocol Designed for High Performance Networking on Myrinet. In: Proceedings of the PC-NOW98: International Workshop on Personal Computer based Networks of Workstations, in conjunction with IPPS/SPDP 1998 (1998)

    Google Scholar 

  34. Rodrigues, S.H., Anderson, T.E., Culler, D.E.: High-Performance Local Area Communication with Fast Sockets. In: USENIX 1997 Annual Technical Conference (January 1997)

    Google Scholar 

  35. Sakr, M.F., Levitan, S.P., Chiarulli, D.M., Horne, B.G., Giles, C.L.: Predicting Multiprocessor Memory Access Patterns with Learning Models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 305–312 (1997)

    Google Scholar 

  36. Shah, G., Nieplocha, J., Mirza, J., Kim, C., Harrison, R., Govindaraju, R.K., Gildea, K., DiNicola, P., Bender, C.: Performance and Experience with LAPI – a New High Performance Communication Library for the IBM RS/6000 SP. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)

    Google Scholar 

  37. Takahashi, T., O’Carrol, F., Tezuka, H., Hori, A., Sumimoto, S., Harada, H., Ishikawa, Y., Beckman, P.H.: Implementation and Evaluation of MPI on an SMP Cluster. In: Proceedings of the PC-NOW 1999: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with PPS/SPDP 1999 (1999)

    Google Scholar 

  38. Tanaka, Y., Matsuda, M., Ando, M., Kubota, K., Sato, M.: COMPaS: A Pentium Pro PCbased SMP Cluster and its Experience. In: Proceedings of the PC-NOW 1998: International Workshop on Personal Computer based Networks Of Workstations, in conjunction with IPPS/SPDP 1998 (1998)

    Google Scholar 

  39. Tezuka, H., O’Carroll, F., Hori, A., Ishikawa, Y.: Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication. In: Rolim, J.D.P. (ed.) IPPS-WS 1998 and SPDP-WS 1998. LNCS, vol. 1388. Springer, Heidelberg (1998)

    Google Scholar 

  40. Eicken, T.V., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active Messages: A Mechanism for Integrated Communication and Computation. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 256–265 (May 1992)

    Google Scholar 

  41. Eicken, T.V., Basu, A., Buch, V., Vogels, W.: U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995)

    Google Scholar 

  42. Worley, P.H., Foster, I.T.: Parallel Spectral Transform Shallow Water Model: A Runtime-tunable parallel benchmark coder. In: Proceedings of the Scalable High Performance Computing Conference, pp. 207–214 (1994)

    Google Scholar 

  43. Zhang, Z., Torrellas, J.: Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching. In: Proeedings of the 22nd Annual International Symposium on Computer Architectures, pp. 188–199 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Afsahi, A., Dimopoulos, N.J. (2000). Efficient Communication Using Message Prediction for Cluster of Multiprocessors. In: Falsafi, B., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 2000. Lecture Notes in Computer Science, vol 1797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720115_12

Download citation

  • DOI: https://doi.org/10.1007/10720115_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67879-3

  • Online ISBN: 978-3-540-44655-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics