Skip to main content
Log in

Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The different implementations of parallel programming constructs interact heavily with a multiprocessor's coherence protocol and thus may have a significant impact on performance. The form and extent of this interaction have not been established so far however, particularly in the case of update-based coherence protocols. In this paper we study the running time and communication behavior of ticket and MCS spin locks; centralized, dissemination, and tree-based barriers; parallel and sequential reductions; linear broadcasting and producer and consumer-driven logarithmic broadcasting; and centralized and distributed task queues, under pure and competitive update coherence protocols on a scalable multiprocessor; results for a write invalidate protocol are presented mostly for comparison purposes. Our experiments indicate that parallel programming techniques that are well-established for write invalidate protocols, such as MCS locks and task queues, are often inappropriate for update-based protocols. In contrast, techniques such as dissemination and tree barriers achieve superior performance under update-based protocols. Our results also show that update-based protocols sometimes lead to different design decisions than write invalidate protocols. Our main conclusion is that indeed the interaction of the parallel programming constructs with the multiprocessor's coherence protocol has a significant impact on performance. The implementation of these constructs must be carefully matched to the coherence protocol if ideal performance is to be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. J. Archibald and J.-L. Baer, Cache coherence protocols: evaluation using multiprocessor simulation model, ACM Trans. Computer Systems 4(4):273–298 (November 1986).

    Google Scholar 

  2. F. Dahlgren, M. Dubois, and P. Stenström, Combined performance gains of simple cache protocol extensions, Proc. 21st Int'l. Symp. Computer Architecture, pp. 187–197 (April 1994).

  3. A. Gupta and W.-D. Weber, Cache invalidation patterns in shared-memory multiprocessors, IEEE Trans. Computers 41(7):794–810 (July 1992).

    Google Scholar 

  4. M. Dubois, J. Skeppstedt, and P. Stenström, Essential misses and data traffic in coherence protocols, J. Parallel and Distributed Computing 29(2):108–125 (September 1995).

    Google Scholar 

  5. R. Bianchini, T. J. LeBlanc, and J. E. Veenstra, Categorizing network traffic in update-based protocols on scalable multiprocessors, Proc. Int'l. Parallel Processing Symp., pp. 142–151 (April 1996).

  6. T. C. Mowry, M. S. Lam, and A. Gupta, Design and evaluation of a compiler algorithm for prefetching, Proc. Fifth Int'l. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 62–75 (October 1992).

  7. H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve, An evaluation of fine-grained producer-initiated communication in cache-coherent multiprocessors, Proc. Third Int'l. Symp. on High-Performance Computer Architecture, pp. 204–215 (February 1997).

  8. D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH prototype: Logic overhead and performance, IEEE Trans. Parallel and Distributed Systems 4(1):41–61 (January 1993).

    Google Scholar 

  9. Kendall Square Research Corporation, KSR1 Principles of Operation, Kendall Square Research Corporation (1992).

  10. A. Agarwal, R. Bianchini, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, The MIT Alewife machine: Architecture and performance, Proc. 22nd Int'l. Symp. Computer Architecture, pp. 2–13 (June 1995).

  11. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, The Stanford FLASH multiprocessor, Proc. 21st Ann. Int'l. Symp. Computer Architecture, pp. 302–313 (April 1994).

  12. S. K. Reinhardt, J. R. Larus, and D. A. Wood, Tempest and typhoon: User-level shared memory, Proc. 21st Ann. Int'l. Symp. Computer Architecture ( April 1994).

  13. J. M. Mellor-Crummey and M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Trans. Computer Systems 9(1):21–65 (February 1991).

    Google Scholar 

  14. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proc. 22nd Int'l. Symp. Computer Architecture, pp. 24–36 (May 1995).

  15. J. E. Veenstra and R. J. Fowler, MINT: A front end for efficient simulation of shared-memory multiprocessor, Proc. Second Int'l. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 201–207 (January 1994).

  16. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proc. 17th Int'l. Symp. Computer Architecture, pp. 148–159 (May 1990).

  17. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström, The detection and elimination of useless misses in multiprocessors, Proc. 20th Int'l. Symp. Computer Architecture, pp. 88–97 (May 1993).

  18. R. Bianchini and L. I. Kontothanassis, Algorithms for categorizing multiprocessor communication under invalidate and update-based coherence protocols, Proc. 28th Ann. Simulation Symp., pp. 115–124 (April 1995).

  19. S. J. Eggers and R. H. Katz, A characterization of sharing in parallel programs and its application to coherency protocol evaluation, Proc. 15th Int'l. Symp. on Computer Architecture, pp. 373–383 (May 1988).

  20. J. E. Veenstra and R. J. Fowler, A performance evaluation of optimal hybrid cache coherency protocols, Proc. Fifth Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst., pp. 149–157 (October 1992).

  21. C. Holt, J. P. Singh, and J. Hennessy, Application and architectural bottlenecks in large scale distributed shared memory machines, Proc. 23rd Int'l. Symp. Computer Architecture, pp. 134–145 (May 1996).

  22. J. Torrellas, M. S. Lam, and J. L. Hennessy, False sharing and spatial locality in multiprocessor caches, IEEE Trans. Computers 43(6):651–663 (June 1994).

    Google Scholar 

  23. S. J. Eggers and T. E. Jeremiassen, Eliminating false sharing, Proc. Int'l. Conf. Parallel Processing, pp. 377–381 (August 1991).

  24. M. M. Michael and M. L. Scott, Implementation of general-purpose atomic primitives for distributed shared-memory multiprocessors, Proc First Int'l. Symp. on High-Performance Computer Architecture, pp. 222–231 (January 1995).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bianchini, R., Carrera, E.V. & Kontothanassis, L. Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs. International Journal of Parallel Programming 26, 143–181 (1998). https://doi.org/10.1023/A:1018744919483

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018744919483

Navigation