Abstract
Even for small multi-core systems, it has become harder and harder to support a simple shared memory abstraction: processors access some memory regions more quickly than others, a phenomenon called non-uniform memory access (NUMA). These trends have prompted researchers to investigate alternative programming abstractions based on message passing rather than cache-coherent shared memory. To advance a pragmatic understanding of these models’ strengths and weaknesses, we have explored a range of different message passing and shared memory designs, for a variety of concurrent data structures, running on different multicore architectures. Our goal was to evaluate which combinations perform best, and where simple software or hardware optimizations might have the most impact. We observe that different approaches perform best in different circumstances, and that the communication overhead of message passing can often outweigh its benefits. Nonetheless, we discuss ways in which this balance may shift in the future. Overall, we conclude that, by emphasizing high-level shared data abstractions, software should be designed to be largely independent of the choice of low-level communication mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: Proc. ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 29–44 (2009)
Calciu, I., Gottschlich, J.E., Herlihy, M.: Using elimination and delegation to implement a scalable NUMA-friendly stack. In: Proc. Usenix Workshop on Hot Topics in Parallelism (HotPar) (2013)
Dashti, M., Fedorova, F., Funston, J., Gaud, F., Lachaize, R., Lachaize, B., Quema, V., Quema, M.: Traffic management: a holistic approach to memory placement on NUMA systems. In: Proc. Conf. on Arch. Support for Prog. Lang. and Op. Systems (ASPLOS), pp. 381–394 (2013)
Dice, D.: NUMA-aware placement of communication variables (November 2012), blogs.oracle.com/dave/entry/numa_aware_placement_of_communication1
Dice, D., Marathe, V.J., Shavit, N.: Lock cohorting: a general technique for designing NUMA locks. In: Proc. ACM Symp. on Principles and Practice of Parallel Programming (PPoPP), pp. 247–256 (2012)
Dice, D., Otenko, O.: Brief announcement: multilane - a concurrent blocking multiset. In: Proc. ACM SPAA, pp. 313–314 (2011)
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat-combining and the synchronization parallelism tradeoff. In: Proceedings of the Twenty Third ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 355–364 (June 2010)
Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 206–215 (2004)
Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS Oper. Syst. Rev. 13(2), 3–19 (1979)
Lozi, J.-P., David, F., Thomas, G., Lawall, J., Muller, G.: Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In: Proc. USENIX Annual Technical Conference, ATC (2012)
Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)
Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: a cache-partitioned hash table. In: Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 319–320. ACM, New York (2012)
Oracle Corporation. Oracle’s Sun Fire X4800 Server Architecture (2010), www.oracle.com/technetwork/articles/systems-hardware-architecture/sf4800g5-architecture-163848.pdf
Oracle Corporation. Oracle’s SPARC T4-1, SPARC T4-2, SPARC T4-4, and SPARC T4-1B Server Architecture (2012), www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o11-090-sparc-t4-arch-496245.pdf
Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: Proc. Int. Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, PDSIA (1999)
Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N.: Accelerating critical section execution with asymmetric multi-core architectures. In: Proc. Conf. on Arch. Support for Prog. Lang. and Op. Systems (ASPLOS), pp. 253–264 (2009)
von Eicken, T., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active messages: a mechanism for integrated communication and computation. In: Proc. Int. Symposium on Computer Architecture (ISCA), pp. 256–266 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Calciu, I. et al. (2013). Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores. In: Baldoni, R., Nisse, N., van Steen, M. (eds) Principles of Distributed Systems. OPODIS 2013. Lecture Notes in Computer Science, vol 8304. Springer, Cham. https://doi.org/10.1007/978-3-319-03850-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-03850-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03849-0
Online ISBN: 978-3-319-03850-6
eBook Packages: Computer ScienceComputer Science (R0)