A survey on optimizations towards best-effort hardware transactional memory

Wu, Zhenwei; Lu, Kai; Wang, Ruibo; Zhang, Wenzhe

doi:10.1007/s42514-020-00049-2

A survey on optimizations towards best-effort hardware transactional memory

Survey Paper
Published: 15 September 2020

Volume 2, pages 401–414, (2020)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Zhenwei Wu ORCID: orcid.org/0000-0002-2005-2831¹,
Kai Lu¹,
Ruibo Wang¹ &
…
Wenzhe Zhang¹

509 Accesses
Explore all metrics

Abstract

Transactional memory has been attracting increasing attention in recent years, and it provides optimistic concurrency control schemes for shared-memory parallel programs. The rapid development and wide adoption of transactional memory make this programming paradigm promising for achieving breakthroughs in massively parallel computing. There has been a large number of discussions towards transactional memory systems, which aimed at providing relatively simple and intuitive synchronization construction for shared-memory parallel programs without sacrificing performance. Hardware transactional memory (HTM) has become commercially available in mainstream processors, however, due to several inherent architectural limitations that will abort hardware transactions, such as cache overflows, context switches, hardware as well as software exceptions, etc., nowadays HTM systems come in a best-effort way, which necessitates the adoption of a software fallback path to ensure forward progress. In this paper, we survey state-of-the-art software-side optimizations for best-effort hardware transaction system, as well as several novel performance tuning techniques. Research efforts about joint usage of HTM and non-volatile memory (NVM) are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

A survey on transactional stream processing

Article Open access 27 September 2023

References

Abadi, M., Birrell, A., Harris, T., Isard, M.: Semantics of transactional memory and automatic mutual exclusion. ACM Trans. Program. Lang. Syst. 33(1). https://doi.org/10.1145/1889997.1889999
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Google Scholar
Ansari, M., Jarvis, K., Kotselidis, C., Lujan, M., Kirkham, C., Watson, I.: Profiling transactional memory applications. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 11–20. (2009)
Apalkov, D., Khvalkovskiy, A., Watts, S., Nikitin, V., Tang, X., Lottis, D., Moon, K., Luo, X., Chen, E., Ong, A., et al.: Spin-transfer torque magnetic random access memory (stt-mram). J. Emerg. Technol. Comput. Syst. 9(2) (2013). https://doi.org/10.1145/2463585.2463589
Armstrong, N., Felber, P., Gramoli, V.: Space-constrained data structures for htm (2018)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Article Google Scholar
Avni, H., Kuszmaul, B.C.: Improving htm scaling with consistency-oblivious programming. In: 9th Workshop on Transactional Computing, TRANSACT, vol. 14 (2014)
Avni, H., Levy, E., Mendelson, A.: Hardware transactions in nonvolatile memory. In: Proceedings of the 29th International Symposium on Distributed Computing - Volume 9363, ser. DISC 2015, pp. 617–630. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-48653-541
Belay, A., Bittau, A., Mashtizadeh, A., Terei, D., Mazières, D., Kozyrakis, C.: Dune: Safe user-level access to privileged CPU features. In: Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 335–348. USENIX, Hollywood, CA (2012). https://www.usenix.org/conference/osdi12/technical-sessions/presentation/belay
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Bonnichsen, L.F., Probst, C.W., Karlsson, S.: Hardware transactional memory optimization guidelines, applied to ordered maps. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 3, pp. 124–131. (2015)
Brown, T.: A template for implementing fast lock-free trees using htm. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, ser. PODC ’17, pp. 293–302. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3087801.3087834
Brown, T., Kogan, A., Lev, Y., Luchangco, V.: Investigating the performance of hardware transactions on a multi-socket machine. In: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’16, pp. 121–132. Association for Computing Machinery, New York, NY (2016). https://doi.org/10.1145/2935764.2935796
Burr, G.W., Breitwisch, M.J., Franceschini, M., Garetto, D., Gopalakrishnan, K., Jackson, B., Kurdi, B., Lam, C., Lastras, L.A., Padilla, A., et al.: Phase change memory technology. J. Vac. Sci. Technol. B Nanotechnol. Microelectron. Mater. Process. Measure. Phenomena 28(2), 223–262 (2010)
Google Scholar
Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: Transaction on 2014 Workshop. ACM (2014)
Castro, D., Romano, P., Barreto, J.: Hardware transactional memory meets memory persistency. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 368–377 (2018)
Chakrabarti, D.R., Banerjee, P., Boehm, H., Joisha, P.G., Schreiber, R.S.: The runtime abort graph and its application to software transactional memory optimization. In: International Symposium on Code Generation and Optimization (CGO 2011), pp. 42–53. (2011)
Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid norec: a case study in the effectiveness of best effort hardware transactional memory. SIGPLAN Not. 46(3), 39–52 (2011). https://doi.org/10.1145/1961296.1950373
Dice, D., Kogan, A., Lev, Y.: Refined transactional lock elision. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’16, pp. 19:1–19:12. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2851141.2851162
Dice, D., Herlihy, M., Lea, D., Lev, Y., Luchangco, V., Mesard, W., Moir, M., Moore, K., Dan, N., Sun, M.: Applications of the adaptive transactional memory test platform. Applications of the Adaptive Transactional Memory Test Platform Researchgate (2008)
Dice, D., Harris, T., Kogan, A., Lev, Y.: The influence of malloc placement on tsx hardware transactional memory. arXiv:1504.04640 (2015)
Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia, PA (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues
Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 3–14. (2014)
Giles, E., Doshi, K., Varman, P.: Continuous checkpointing of htm transactions in nvm. SIGPLAN Not. 52(9), 70–81. (2017). https://doi.org/10.1145/3156685.3092270
Hammarlund, P., Martinez, A.J., Bajwa, A.A., Hill, D.L., Hallnor, E., Jiang, H., Dixon, M., Derr, M., Hunsaker, M., Kumar, R., Osborne, R.B., Rajwar, R., Singhal, R., D’Sa, R., Chappell, R., Kaushik, S., Chennupaty, S., Jourdan, S., Gunther, S., Piazza, T., Burton, T.: Haswell: The fourth-generation intel core processor. IEEE Micro 34(2), 6–20 (2014)
Article Google Scholar
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th annual international symposium on computer architecture, ser. ISCA ’93, pp. 289–300. ACM, New York, NY, USA (1993). https://doi.org/10.1145/165123.165164
Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)
Article Google Scholar
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual (2016)
Izraelevitz, J., Kogan, A., Lev, Y.: Implicit acceleration of critical sections via unsuccessful speculation. 11th ACM SIGPLAN Wkshp. on Transactional Computing, TRANSACT, vol. 16 (2016)
Izraelevitz, J., Xiang, L., Scott, M.L.: Performance improvement via always-abort htm. In: 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 79–90 (2017)
Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv:1903.05714 (2019)
Joshi, A., Nagarajan, V., Cintra, M., Viglas, S.: Dhtm: Durable hardware transactional memory. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 452–465 (2018)
Li, X., Gulila, A.: Optimised memory allocation for less false abortion and better performance in hardware transactional memory. Int. J. Parallel Emerg Distrib. Syst. (2019). https://doi.org/10.1080/17445760.2019.1605605
Liu, Y., Gottschlich, J., Pokam, G., Spear, M.: Tsxprof: Profiling hardware transactions. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 75–86. (2015)
Liu, M., Zhang, M., Chen, K., Qian, X., Wu, Y., Zheng, W., Ren, J.: Dudetm: building durable transactions with decoupling for persistent memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17. New York, NY, USA: Association for Computing Machinery, pp. 329–343 (2017). https://doi.org/10.1145/3037697.3037714
Minh, Chi Cao, Chung, JaeWoong, Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: 2008 IEEE International Symposium on Workload Characterization, pp. 35–46. (2008)
Nguyen, D., Pingali, K.: What scalable programs need from transactional memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17, pp. 105–118. Association for Computing Machinery, New York, NY (2017). https://doi.org/10.1145/3037697.3037750
Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the intel optane byte-addressable nvm. In: Proceedings of the International Symposium on Memory Systems, ser. MEMSYS ’19, pp. 304–315. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3357526.3357568
Sanchez, D., Yen, L., Hill, M.D., Sankaralingam, K.: Implementing signatures for transactional memory. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40, pp. 123–133. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/MICRO.2007.24
Sanchez, D., Kozyrakis, C.: The zcache: decoupling ways and associativity. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 187–198. (2010)
Sutton, R.S., Barto, A.G.: Reinforcement learning i: Introduction (1998)
Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 46(3), 91–104 (2011). https://doi.org/10.1145/1961296.1950379
Wang, Q., Su, P., Chabbi, M., Liu, X.: Lightweight hardware transactional memory profiling. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’19. New York, NY, USA: Association for Computing Machinery, pp. 186–200. (2019). https://doi.org/10.1145/3293883.3295728
Wang, X., Zhang, W., Wang, Z., Wei, Z., Chen, H., Zhao, W.: Eunomia: scaling concurrent search trees under contention using htm. In: Sigplan Symposium on Principles and Practice of Parallel Programming
Wu, Z., Lu, K., Zhang, W., Nisbet, A., Luján, M.: POSTER: quiescent and versioned shadow copies for NVM. In: 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019, Seattle, WA, USA, September 23-26, 2019. IEEE, pp. 491–492 (2019). https://doi.org/10.1109/PACT.2019.00060
Xiang, L., Scott, M.L.: Software partitioning of hardware transactions. ACM SIGPLAN Notes 50(8), 76–86 (2015)
Article Google Scholar
Xiang, L., Scott, M.L.: Compiler aided manual speculation for high performance concurrent data structures. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 47–56 (2013)
Xiang, L., Scott, M.L.: Mspec: A design pattern for concurrent data structures. 7th SIGPLAN Wkshp. on Transactional Computing (TRANSACT), New Orleans, LA (2012)
Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel\(^{\textregistered }\) transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13. New York, NY, USA: Association for Computing Machinery (2013). https://doi.org/10.1145/2503210.2503232
Zardoshti, P., Zhou, T., Balaji, P., Scott, M.L., Spear, M.: Simplifying transactional memory support in c++. ACM Trans. Archit. Code Optim. 16(3), (2019). https://doi.org/10.1145/3328796
Zhang, W., Lu, K., Wang, X.: Versionized process based on non-volatile random-access memory for fine-grained fault tolerance. Front. IT & EE 19(2), 192–205 (2018). https://doi.org/10.1631/FITEE.1601477
Zhang, W., Lu, K., Wang, X., Jian, J.: Fast persistent heap based on non-volatile memory. IEICE Trans. 100-D(5), 1035–1045 (2017). https://doi.org/10.1587/transinf.2016EDP7429
Zhang, W., Lu, K., Luján, M., Wang, X., Zhou, X.: Fine-grained checkpoint based on non-volatile memory. Front. IT & EE, 18(2), 220–234 (2017). https://doi.org/10.1631/FITEE.1500352
Zyulkyarov, F., Stipic, S., Harris, T., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: “Discovering and understanding performance bottlenecks in transactional applications. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 285–294 (2010)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. This work is supported by National High-level Personnel for Defense Technology Program (2017-JCJQ-ZQ-013), NSF 61902405.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, Hunan, China
Zhenwei Wu, Kai Lu, Ruibo Wang & Wenzhe Zhang

Authors

Zhenwei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ruibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Z., Lu, K., Wang, R. et al. A survey on optimizations towards best-effort hardware transactional memory. CCF Trans. HPC 2, 401–414 (2020). https://doi.org/10.1007/s42514-020-00049-2

Download citation

Received: 17 February 2020
Accepted: 30 July 2020
Published: 15 September 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s42514-020-00049-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on optimizations towards best-effort hardware transactional memory

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Performance improvement of the triangular matrix product in commodity clusters

A survey on transactional stream processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on optimizations towards best-effort hardware transactional memory

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Performance improvement of the triangular matrix product in commodity clusters

A survey on transactional stream processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation