Skip to main content
Log in

A survey on optimizations towards best-effort hardware transactional memory

  • Survey Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Transactional memory has been attracting increasing attention in recent years, and it provides optimistic concurrency control schemes for shared-memory parallel programs. The rapid development and wide adoption of transactional memory make this programming paradigm promising for achieving breakthroughs in massively parallel computing. There has been a large number of discussions towards transactional memory systems, which aimed at providing relatively simple and intuitive synchronization construction for shared-memory parallel programs without sacrificing performance. Hardware transactional memory (HTM) has become commercially available in mainstream processors, however, due to several inherent architectural limitations that will abort hardware transactions, such as cache overflows, context switches, hardware as well as software exceptions, etc., nowadays HTM systems come in a best-effort way, which necessitates the adoption of a software fallback path to ensure forward progress. In this paper, we survey state-of-the-art software-side optimizations for best-effort hardware transaction system, as well as several novel performance tuning techniques. Research efforts about joint usage of HTM and non-volatile memory (NVM) are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abadi, M., Birrell, A., Harris, T., Isard, M.: Semantics of transactional memory and automatic mutual exclusion. ACM Trans. Program. Lang. Syst. 33(1). https://doi.org/10.1145/1889997.1889999

  • Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)

    Google Scholar 

  • Ansari, M., Jarvis, K., Kotselidis, C., Lujan, M., Kirkham, C., Watson, I.: Profiling transactional memory applications. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 11–20. (2009)

  • Apalkov, D., Khvalkovskiy, A., Watts, S., Nikitin, V., Tang, X., Lottis, D., Moon, K., Luo, X., Chen, E., Ong, A., et al.: Spin-transfer torque magnetic random access memory (stt-mram). J. Emerg. Technol. Comput. Syst. 9(2) (2013). https://doi.org/10.1145/2463585.2463589

  • Armstrong, N., Felber, P., Gramoli, V.: Space-constrained data structures for htm (2018)

  • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)

    Article  Google Scholar 

  • Avni, H., Kuszmaul, B.C.: Improving htm scaling with consistency-oblivious programming. In: 9th Workshop on Transactional Computing, TRANSACT, vol. 14 (2014)

  • Avni, H., Levy, E., Mendelson, A.: Hardware transactions in nonvolatile memory. In: Proceedings of the 29th International Symposium on Distributed Computing - Volume 9363, ser. DISC 2015, pp. 617–630. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-48653-541

  • Belay, A., Bittau, A., Mashtizadeh, A., Terei, D., Mazières, D., Kozyrakis, C.: Dune: Safe user-level access to privileged CPU features. In: Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 335–348. USENIX, Hollywood, CA (2012). https://www.usenix.org/conference/osdi12/technical-sessions/presentation/belay

  • Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)

    Article  Google Scholar 

  • Bonnichsen, L.F., Probst, C.W., Karlsson, S.: Hardware transactional memory optimization guidelines, applied to ordered maps. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 3, pp. 124–131. (2015)

  • Brown, T.: A template for implementing fast lock-free trees using htm. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, ser. PODC ’17, pp. 293–302. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3087801.3087834

  • Brown, T., Kogan, A., Lev, Y., Luchangco, V.: Investigating the performance of hardware transactions on a multi-socket machine. In: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’16, pp. 121–132. Association for Computing Machinery, New York, NY (2016). https://doi.org/10.1145/2935764.2935796

  • Burr, G.W., Breitwisch, M.J., Franceschini, M., Garetto, D., Gopalakrishnan, K., Jackson, B., Kurdi, B., Lam, C., Lastras, L.A., Padilla, A., et al.: Phase change memory technology. J. Vac. Sci. Technol. B Nanotechnol. Microelectron. Mater. Process. Measure. Phenomena 28(2), 223–262 (2010)

    Google Scholar 

  • Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: Transaction on 2014 Workshop. ACM (2014)

  • Castro, D., Romano, P., Barreto, J.: Hardware transactional memory meets memory persistency. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 368–377 (2018)

  • Chakrabarti, D.R., Banerjee, P., Boehm, H., Joisha, P.G., Schreiber, R.S.: The runtime abort graph and its application to software transactional memory optimization. In: International Symposium on Code Generation and Optimization (CGO 2011), pp. 42–53. (2011)

  • Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid norec: a case study in the effectiveness of best effort hardware transactional memory. SIGPLAN Not. 46(3), 39–52 (2011). https://doi.org/10.1145/1961296.1950373

  • Dice, D., Kogan, A., Lev, Y.: Refined transactional lock elision. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’16, pp. 19:1–19:12. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2851141.2851162

  • Dice, D., Herlihy, M., Lea, D., Lev, Y., Luchangco, V., Mesard, W., Moir, M., Moore, K., Dan, N., Sun, M.: Applications of the adaptive transactional memory test platform. Applications of the Adaptive Transactional Memory Test Platform Researchgate (2008)

  • Dice, D., Harris, T., Kogan, A., Lev, Y.: The influence of malloc placement on tsx hardware transactional memory. arXiv:1504.04640 (2015)

  • Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia, PA (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues

  • Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 3–14. (2014)

  • Giles, E., Doshi, K., Varman, P.: Continuous checkpointing of htm transactions in nvm. SIGPLAN Not. 52(9), 70–81. (2017). https://doi.org/10.1145/3156685.3092270

  • Hammarlund, P., Martinez, A.J., Bajwa, A.A., Hill, D.L., Hallnor, E., Jiang, H., Dixon, M., Derr, M., Hunsaker, M., Kumar, R., Osborne, R.B., Rajwar, R., Singhal, R., D’Sa, R., Chappell, R., Kaushik, S., Chennupaty, S., Jourdan, S., Gunther, S., Piazza, T., Burton, T.: Haswell: The fourth-generation intel core processor. IEEE Micro 34(2), 6–20 (2014)

    Article  Google Scholar 

  • Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th annual international symposium on computer architecture, ser. ISCA ’93, pp. 289–300. ACM, New York, NY, USA (1993). https://doi.org/10.1145/165123.165164

  • Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)

    Article  Google Scholar 

  • Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual (2016)

  • Izraelevitz, J., Kogan, A., Lev, Y.: Implicit acceleration of critical sections via unsuccessful speculation. 11th ACM SIGPLAN Wkshp. on Transactional Computing, TRANSACT, vol. 16 (2016)

  • Izraelevitz, J., Xiang, L., Scott, M.L.: Performance improvement via always-abort htm. In: 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 79–90 (2017)

  • Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv:1903.05714 (2019)

  • Joshi, A., Nagarajan, V., Cintra, M., Viglas, S.: Dhtm: Durable hardware transactional memory. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 452–465 (2018)

  • Li, X., Gulila, A.: Optimised memory allocation for less false abortion and better performance in hardware transactional memory. Int. J. Parallel Emerg Distrib. Syst. (2019). https://doi.org/10.1080/17445760.2019.1605605

  • Liu, Y., Gottschlich, J., Pokam, G., Spear, M.: Tsxprof: Profiling hardware transactions. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 75–86. (2015)

  • Liu, M., Zhang, M., Chen, K., Qian, X., Wu, Y., Zheng, W., Ren, J.: Dudetm: building durable transactions with decoupling for persistent memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17. New York, NY, USA: Association for Computing Machinery, pp. 329–343 (2017). https://doi.org/10.1145/3037697.3037714

  • Minh, Chi Cao, Chung, JaeWoong, Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: 2008 IEEE International Symposium on Workload Characterization, pp. 35–46. (2008)

  • Nguyen, D., Pingali, K.: What scalable programs need from transactional memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17, pp. 105–118. Association for Computing Machinery, New York, NY (2017). https://doi.org/10.1145/3037697.3037750

  • Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the intel optane byte-addressable nvm. In: Proceedings of the International Symposium on Memory Systems, ser. MEMSYS ’19, pp. 304–315. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3357526.3357568

  • Sanchez, D., Yen, L., Hill, M.D., Sankaralingam, K.: Implementing signatures for transactional memory. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40, pp. 123–133. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/MICRO.2007.24

  • Sanchez, D., Kozyrakis, C.: The zcache: decoupling ways and associativity. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 187–198. (2010)

  • Sutton, R.S., Barto, A.G.: Reinforcement learning i: Introduction (1998)

  • Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 46(3), 91–104 (2011). https://doi.org/10.1145/1961296.1950379

  • Wang, Q., Su, P., Chabbi, M., Liu, X.: Lightweight hardware transactional memory profiling. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’19. New York, NY, USA: Association for Computing Machinery, pp. 186–200. (2019). https://doi.org/10.1145/3293883.3295728

  • Wang, X., Zhang, W., Wang, Z., Wei, Z., Chen, H., Zhao, W.: Eunomia: scaling concurrent search trees under contention using htm. In: Sigplan Symposium on Principles and Practice of Parallel Programming

  • Wu, Z., Lu, K., Zhang, W., Nisbet, A., Luján, M.: POSTER: quiescent and versioned shadow copies for NVM. In: 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019, Seattle, WA, USA, September 23-26, 2019. IEEE, pp. 491–492 (2019). https://doi.org/10.1109/PACT.2019.00060

  • Xiang, L., Scott, M.L.: Software partitioning of hardware transactions. ACM SIGPLAN Notes 50(8), 76–86 (2015)

    Article  Google Scholar 

  • Xiang, L., Scott, M.L.: Compiler aided manual speculation for high performance concurrent data structures. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 47–56 (2013)

  • Xiang, L., Scott, M.L.: Mspec: A design pattern for concurrent data structures. 7th SIGPLAN Wkshp. on Transactional Computing (TRANSACT), New Orleans, LA (2012)

  • Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel\(^{\textregistered }\) transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13. New York, NY, USA: Association for Computing Machinery (2013). https://doi.org/10.1145/2503210.2503232

  • Zardoshti, P., Zhou, T., Balaji, P., Scott, M.L., Spear, M.: Simplifying transactional memory support in c++. ACM Trans. Archit. Code Optim. 16(3), (2019). https://doi.org/10.1145/3328796

  • Zhang, W., Lu, K., Wang, X.: Versionized process based on non-volatile random-access memory for fine-grained fault tolerance. Front. IT & EE 19(2), 192–205 (2018). https://doi.org/10.1631/FITEE.1601477

  • Zhang, W., Lu, K., Wang, X., Jian, J.: Fast persistent heap based on non-volatile memory. IEICE Trans. 100-D(5), 1035–1045 (2017). https://doi.org/10.1587/transinf.2016EDP7429

  • Zhang, W., Lu, K., Luján, M., Wang, X., Zhou, X.: Fine-grained checkpoint based on non-volatile memory. Front. IT & EE, 18(2), 220–234 (2017). https://doi.org/10.1631/FITEE.1500352

  • Zyulkyarov, F., Stipic, S., Harris, T., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: “Discovering and understanding performance bottlenecks in transactional applications. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 285–294 (2010)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. This work is supported by National High-level Personnel for Defense Technology Program (2017-JCJQ-ZQ-013), NSF 61902405.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Lu, K., Wang, R. et al. A survey on optimizations towards best-effort hardware transactional memory. CCF Trans. HPC 2, 401–414 (2020). https://doi.org/10.1007/s42514-020-00049-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00049-2

Keywords

Navigation