Skip to main content
Log in

\(\hbox {TM}^{2}\)C: a software transactional memory for many-cores

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Transactional memory is an appealing paradigm for concurrent systems. Many software implementations of the paradigm were proposed in the past two decades for both shared memory multi-core systems and clusters of distributed machines. Chip manufacturers have however started producing many-core architectures, with low network-on-chip communication latencies and limited support for cache coherence, rendering existing transactional-memory implementations inapplicable. This paper presents \(\hbox {TM}^{2}\hbox {C}\), the first software transactional memory protocol for many-core systems, hence featuring transactions that are both distributed and leverage shared memory. \(\hbox {TM}^{2}\hbox {C}\) exploits fast messages over network-on-chip to make accesses to shared data coherent. In particular, it allows visible read accesses to detect conflicts eagerly and incorporates the first distributed contention manager that guarantees the commit of all transactions. We evaluate \(\hbox {TM}^{2}\hbox {C}\) on Intel, AMD and Tilera architectures, ranging from common multi-cores to experimental many-cores. We build upon new message-passing protocols, based on both software and hardware, which are interesting in their own right. Our results on various benchmarks, including realistic banking and MapReduce applications, show that \(\hbox {TM}^{2}\hbox {C}\) scales well regardless of the underlying platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In Sect. 4 we explain how we are able to support a single-byte granularity without excessive metadata overhead.

  2. Tilera was acquired by EZchip in 2014, which in turn was acquired by Mellanox in 2015.

  3. https://github.com/trigonak/ssmp.

  4. http://communities.intel.com/docs/DOC-6003.

  5. In the case where all sharers of a cache line are on the same socket, the latency is 120 ns instead of 40 ns that it could be.

  6. The Intel C/C++ compiler and gcc support it.

  7. Compiling software for SCC and for Tilera requires custom versions of the icc and gcc compilers, respectively. None of these two support the automated instrumentation of transactions.

  8. Others are JudoSTM [60] and NOrec [20].

  9. We have validated that the results are practically identical for larger duration as well (e.g., 10 or 60 s).

  10. For read-heavy workloads, STM algorithms that use timestamps and transactional-load revalidation, such TL2 [23] and TinySTM [27], are more suitable.

  11. To be precise, it commits one transaction after all the other threads stop executing in the end of the benchmark.

References

  1. Abts, D., Enright Jerger, N.D., Kim, J., Gibson, D., Lipasti, M.H.: Achieving predictable performance through better memory controller placement in many-core cmps. In: ISCA, pp. 451–461 (2009)

  2. Aguilera, M., Merchant, A., Veitch, A., Karamanolis, C.: Sinfonia : a new paradigm for building scalable distributed systems. In: SOSP (2007)

  3. Attiya, H., Gramoli, V., Milani, A.: Brief announcement: combine—an improved directory-based consistency protocol. In: SPAA, pp. 72–73 (2010)

  4. Attiya, H., Gramoli, V., Milani, A.: A provably starvation-free distributed directory protocol. In: SSS, pp. 405–419 (2010)

  5. Balaji, P., Narravula, S., Vaidyanathan, K., Krishnamoorthy, S., Wu, J., Panda, D.K.: Sockets direct protocol over infiniband in clusters: is it beneficial? In: ISPASS, pp. 28–35 (2004)

  6. Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schupbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)

  7. Bayer, R., Schkolnick, M.: Concurrency of operations on b-trees. Acta Inf. 9(1), 1–21 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  8. Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011)

  9. Bieniusa, A., Fuhrmann, T.: Consistency in hindsight: a fully decentralized stm algorithm. In: IPDPS, pp. 1–12 (2010)

  10. Bocchino, R., Adve, V., Chamberlain, B.: Software transactional memory for large scale clusters. In: PPoPP, pp. 247–258 (2008)

  11. Borkar, S.: Thousand core chips: a technology perspective. In: DAC, pp. 746–749 (2007)

  12. Borkar, S., Chien, A.A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)

    Article  Google Scholar 

  13. Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, Ni.: An analysis of linux scalability to many cores. In: OSDI (2010)

  14. Boyd-Wickizer, S., Kaashoek, M.F., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium (2012)

  15. Carvalho, N., Romano, P., Rodrigues, L.: Asynchronous lease-based replication of software transactional memory. In: Middleware, pp. 376–396 (2010)

  16. Carvalho, N., Romano, P., Rodrigues, L.: SCert: Speculative certification in replicated software transactional memories. In: SYSTOR, pp. 10:1–10:13 (2011)

  17. Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., Chou, C.-T.: Denovo: rethinking the memory hierarchy for disciplined parallelism. In: PACT, pp. 155–166 (2011)

  18. Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Cache hierarchy and memory subsystem of the amd opteron processor. Micro IEEE 30(2), 16–29 (2010)

    Article  Google Scholar 

  19. Couceiro, M., Romano, P., Carvalho, N., Rodrigues, L.: D2STM: dependable distributed software transactional memory. In: PRDC, pp. 307–313 (2009)

  20. Dalessandro, L., Spear, M.F., Scott, M.L.: Norec: streamlining STM by abolishing ownership records. In PPoPP (2010)

  21. David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: SOSP, pp. 33–48 (2013)

  22. Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: taxonomy and survey. ACM Computing Surveys, pp. 372–421 (2004)

  23. Dice, D., Shalev, O., Shavit, N.: Transactional locking II. In: DISC, pp. 194–208 (2006)

  24. Dice, D., Shavit, N.: TLRW: return of the read-write lock. In: SPAA (2010)

  25. Dragojevic, A., Felber, P., Gramoli, V., Guerraoui, R.: Why STM can be more than a research toy. Commun. ACM 54(4), 70–77 (2011)

    Article  Google Scholar 

  26. Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: compact and concurrent memcache with dumber caching and smarter hashing. In NSDI (2013)

  27. Felber, P., Fetzer, C., Riegel, T.: Dynamic performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246 (2008)

  28. Felber, P., Gramoli, V., Guerraoui, R.: Elastic transactions. In: DISC, pp. 93–107 (2009)

  29. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Quantifying the mismatch between emerging scale-out applications and modern processors. ACM Trans. Comput. Syst. 30(4), 15:1–15:24 (2012)

    Article  Google Scholar 

  30. Gramoli, V.: More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In: PPoPP, pp. 1–10 (2015)

  31. Gramoli, V., Guerraoui, R., Trigonakis, V.: TM2C: a software transactional memory for many-cores. In: EuroSys, pp. 351–364 (2012)

  32. Gray, J.: Notes on data base operating systems. In: Operating Systems, An Advanced Course, volume 60 of LNCS, pp. 393–481 (1978)

  33. Guerraoui, R., Herlihy, M., Pochon, B.: Toward a theory of transactional contention managers. In: PODC, pp. 258–264 (2005)

  34. Guerraoui, R., Kapalka, M.: The semantics of progress in lock-based transactional memory. In POPL, pp. 404–415 (2009)

  35. Guerraoui, R., Kapalka, M.: Principles of Transactional Memory. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, San Rafael (2010)

    MATH  Google Scholar 

  36. Harmanci, D., Gramoli, V., Felber, P., Fetzer, C.: Extensible transactional memory testbed. J. Parallel Distrib. Comput. 70(10), 1053–1067 (2010)

    Article  MATH  Google Scholar 

  37. Harris, T., Larus, J.R., Rajwar, R.: Transactional Memory. Synthesis Lectures on Computer Architecture, 2nd edn. Morgan & Claypool Publishers, San Rafael (2010)

    Google Scholar 

  38. Herlihy, M., Luchangco, V., Moir, M.: A flexible framework for implementing software transactional memory. In: OOPSLA, pp. 253–262 (2006)

  39. Herlihy, M., Luchangco, V., Moir, M., Scherer, W.: Software transactional memory for dynamic-sized data structures. In: PODC, pp. 92–101 (2003)

  40. Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: ISCA, pp. 289–300 (1993)

  41. Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Elsevier (2012). (Revised Reprint)

  42. Herlihy, M., Sun, Y.: Distributed transactional memory for metric-space networks. In: DISC, pp. 58–208 (2005)

  43. Howard, J., Dighe, S., Hoskote, Y., Vangal, S., Finan, D., Ruhl, G., Jenkins, D., Wilson, H., Borkar, N., Schrom, G., Pailet, F., Jain, S., Jacob, T., Yada, S., Marella, S., Salihundam, P., Erraguntla, V., Konow, M., Riepen, M., Droege, G., Lindemann, J., Gries, M., Apel, T., Henriss, K., Lund-Larsen, T., Steibl, S., Borkar, S., De, V., Van Der Wijngaart, R., Mattson, T.: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In: ISSCC, pp. 108–109 (2010)

  44. Intel transactional memory abi. http://software.intel.com/sites/default/files/m/5/a/2/a/f/8097-Intel_TM_ABI_1_0_1.pdf (2009)

  45. Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for ibm system z. In: MICRO, pp. 25–36 (2012)

  46. Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Scalability of write-ahead logging on multicore and multisocket hardware. VLDB J. 21(2), 239–263 (2012)

    Article  Google Scholar 

  47. Jose, J., Subramoni, H., Luo, M., Zhang, M., Huang, J., Wasi-ur Rahman, M., Islam, N.S., Ouyang, X., Wang, H., Sur, S., Panda, D.K. : Memcached design on high performance rdma capable interconnects. In: ICPP, pp. 743–752 (2011)

  48. Kelm, J.H., Johnson, D.R., Tuohy, W., Lumetta, S.S., Patel, S.J.: Cohesion: a hybrid memory model for accelerators. In: ISCA, pp. 429–440 (2010)

  49. Kontothanassis, L., Scott, M.: Software cache coherence for large scale multiprocessors. In: HPCA, pp. 286–295 (1995)

  50. Kotselidis, C., Ansari, M., Jarvis, K., Luján, M., Kirkham, C., Watson, I.: DiSTM: a software transactional memory framework for clusters. In: ICPP, pp. 51–58 (2008)

  51. Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., Hennessy, J.: The directory-based cache coherence protocol for the DASH multiprocessor. In: ISCA, pp. 148–159 (1990)

  52. Lim, H., Fan, B., Andersen, D.G., Kaminsky, M.: Silt: a memory-efficient, high-performance key-value store. In: SOSP, pp. 1–13 (2011)

  53. Liskov, B.: The argus language and system. In: Distributed Systems: Methods and Tools for Specification, An Advanced Course, volume 190 of LNCS, pp. 343–430 (1985)

  54. Manassiev, K., Mihailescu, M., Amza, C.: Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP, pp. 198–208 (2006)

  55. Martin, M., Blundell, C., Lewis, E.: Subtleties of transactional memory atomicity semantics. IEEE Comput. Archit. Lett. 5 (2006)

  56. Martin, M.M.K., Hill, M.D., Sorin, D.J.: Why on-chip cache coherence is here to stay. Commun. ACM 55(7), 78–89 (2012)

    Article  Google Scholar 

  57. Mattson, T.G., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., Dighe, S.: The 48-core SCC processor: the programmer’s view. In: SC, pp. 1–11 (2010)

  58. Mellor-Crummey, J., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS 9(1), 21–65 (1991)

    Article  Google Scholar 

  59. Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)

    Article  Google Scholar 

  60. Olszewski, M., Cutler, J., Steffan, J.G.: Judostm: a dynamic binary-rewriting approach to software transactional memory. In: PACT, pp. 365–375 (2007)

  61. Papamarcos, M.S., Patel, J.H.: A low-overhead coherence solution for multiprocessors with private cache memories. In: ISCA, pp. 348–354 (1984)

  62. Pritchett, D.: Base: an acid alternative. Queue 6(3), 48–55 (2008)

    Article  Google Scholar 

  63. Rajwar, R., Goodman, J.R.: Speculative lock elision: enabling highly concurrent multithreaded execution. In: MICRO, pp. 294–305 (2001)

  64. Romano, P., Carvalho, N., Rodrigues, L.: Towards distributed software transactional memory systems. In: LADIS, pp. 1–4 (2008)

  65. Romano, P., Rodrigues, L., Carvalho, N., Cachopo, J.: Cloud-tm: harnessing the cloud with distributed transactional memories. SIGOPS Oper. Syst. Rev. 44(2), 1–6 (2010)

    Article  Google Scholar 

  66. Saad, M., Ravindran, B.: Snake: control flow distributed software transactional memory. In: SSS, pp. 238–252 (2011)

  67. Saad, M., Ravindran, B.: Transactional Forwarding Algorithm. Technical Report, Virigina Tech (2011)

  68. Scherer W., Scott, M.: Contention management in dynamic software transactional memory. In: PODC Workshop on Concurrency and Synchronization in Java Programs (2004)

  69. Scherer W., Scott, M.: Advanced contention management for dynamic software transactional memory. In: PODC, pp. 240–248 (2005)

  70. Sewall, J., Chhugani, J., Kim, C., Satish, N., Dubey, P.: Palm: parallel architecture-friendly latch-free modifications to b+ trees on many-core processors. PVLDB 4(11), 795–806 (2011)

    Google Scholar 

  71. Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204–213 (1995)

  72. Spear, M.F., Marathe, V.J., Dalessandro, L., Scott, M.L.: Privatization techniques for software transactional memory. In: PODC (2007)

  73. Tilera tile-gx. http://www.mellanox.com/related-docs/prod_multi_core/PB_TILE-Gx36.pdf (2014)

  74. Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of blue gene/q hardware support for transactional memories. In: PACT, pp. 127–136 (2012)

  75. Welc, A., Saha, B., Adl-Tabatabai, A.-R.: Irrevocable transactions and their applications. In: SPAA (2008)

  76. Zhang, B.: On the Design of Contention Managers and Cache-Coherence Protocols for Distributed Transactional Memory. Ph.D. Thesis, Virginia Tech (2009)

  77. Zhang, B., Ravindran, B.: Relay : a cache-coherence protocol for distributed transactional memory. In: OPODIS, pp. 48–53 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Trigonakis.

Additional information

A preliminary version of this paper appeared in the proceedings of EuroSys 2012 [31]. We extend the conference version with (1) results from an optimized implementation of the system, (2) evaluation on cache-coherent multi-cores (3) evaluation on a Tilera architecture, (4) an implementation of the system using shared memory and locks, (5) performance comparisons with a state-of-the-art STM, and (6) important details about the implementation of the system.

We wish to thank Maurice Herlihy for his comments on an earlier version of this paper and the Intel MARC Community for its support while programming on SCC. Part of the research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007–2013) under grant agreement 248465, the S(o)OS project. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program.

Author names appear in alphabetical order.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gramoli, V., Guerraoui, R. & Trigonakis, V. \(\hbox {TM}^{2}\)C: a software transactional memory for many-cores. Distrib. Comput. 31, 367–388 (2018). https://doi.org/10.1007/s00446-017-0310-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-017-0310-6

Keywords

Navigation