Skip to main content
Log in

Analysing software prefetching opportunities in hardware transactional memory

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Hardware transactional memory emerged to make parallel programming more accessible. However, the performance pitfall of this technique is squashing speculatively executed instructions and re-executing them in case of aborts, ultimately resorting to serialization in case of repeated conflicts. A significant fraction of aborts occurs due to conflicts (concurrent reads and writes to the same memory location performed by different threads). Our proposal aims to reduce conflict aborts by reducing the window of time during which transactional regions can suffer conflicts. We achieve this by using software prefetching instructions inserted automatically at compile-time. Through these prefetch instructions, we intend to bring the necessary data for each transaction from the main memory to the cache before the transaction itself starts to execute, thus converting the otherwise long latency cache misses into hits during the execution of the transaction. The obtained results show that our approach decreases the number of aborts by 30% on average and improves performance by up to 19% and 10% for two out of the eight evaluated benchmarks. We provide insights into when our technique is beneficial given certain characteristics of the transactional regions, the advantages and disadvantages of our approach, and finally, discuss potential solutions to overcome some of its limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. If a function is called in two different transactions, we create one AP version for each call context. AP versions are transaction specific because the selection of the instructions for each AP depends on how the memory updates performed within the function affect its callers.

References

  1. Ansari M, Khan B, Luján M, Kotselidis C, Kirkham C, Watson I (2010) Improving performance by reducing aborts in hardware transactional memory. In: High Performance Embedded Architectures and Compilers, pp 35–49

  2. Ansari M, Luján M, Kotselidis C, Jarvis K, Kirkham C, Watson I (2009) Steal-on-abort: improving transactional memory performance through dynamic transaction reordering. In: Proceedings of the High Performance Embedded Architectures and Compilers, pp 4–18

  3. ARM Ltd Transactional Memory Extension (TME) intrinsics. https://developer.arm.com/documentation/101028/0011/Transactional-Memory-Extension--TME--intrinsics

  4. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. Comput Arch News 39(2):1–7

    Article  Google Scholar 

  5. Dash A, Demsky B (2010) Automatically generating symbolic prefetches for distributed transactional memories. In: Middleware 2010. Lecture Notes in Computer Science, vol 6452

  6. Dash A, Demsky B (2011) Integrating caching and prefetching mechanisms in a distributed transactional memory. IEEE Trans Parallel Distrib Syst 22(8):1284–1298

    Article  Google Scholar 

  7. Dice D, Herlihy M, Kogan A (2018) Improving parallelism in hardware transactional memory. ACM Trans Arch Code Optim 15(1):1–24

    Article  Google Scholar 

  8. Diegues N, Romano P (2014) Time-warp: lightweight abort minimization in transactional memory. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming, pp 167–178

  9. Diegues N, Romano P, Garbatov S (2017) Seer: probabilistic scheduling for hardware transactional memory. ACM Trans Comput Syst 35(3):1–41

    Article  Google Scholar 

  10. Dragojevic A, Guerraoui R (2010) Predicting the scalability of an stm. In: 5th ACM SIGPLAN Workshop on Transactional Computing

  11. Harris T, Larus J, Rajwar R (2010) Transactional Memory, 2nd edn. Morgan & Claypool Publishers Series

  12. Jacobi C, Slegel T, Greiner D (2012) Transactional memory architecture and implementation for IBM system Z. In: Proceedings of the International Symposium on Microarchitecture, pp 25–36

  13. Jimborean A, Koukos K, Spiliopoulos V, Black-Schaffer D, Kaxiras S (2014) Fix the code. Don’t tweak the hardware: a new compiler approach to voltage-frequency scaling. In: Proceedings of the International Symposium on Code Generation and Optimization, pp 262–272

  14. Koukos K, Ekemark P, Zacharopoulos G, Spiliopoulos V, Kaxiras S, Jimborean A (2016) Daedal decoupled access-execute LLVM tools repository. https://github.com/etascale/daedal

  15. Koukos K, Ekemark P, Zacharopoulos G, Spiliopoulos V, Kaxiras S, Jimborean A (2016) Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs. In: Proceedings of the 25th International Conference on Compiler Construction, pp 121–131

  16. Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization, pp 75–88

  17. Le HQ, Guthrie GL, Williams DE, Michael MM, Frey BG, Starke WJ, May C, Odaira R, Nakaike T (2015) Transactional memory support in the IBM POWER8 processor. IBM J Res Dev 59(1):8:1-8:14

    Article  Google Scholar 

  18. Litz H, Cheriton D, Firoozshahian A, Azizi O, Stevenson JP (2014) Si-TM: reducing transactional memory abort rates through snapshot isolation. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 383–398

  19. Maldonado W, Marlier P, Felber P, Suissa A, Hendler D, Fedorova A, Lawall JL, Muller G (2009) Scheduling support for transactional memory contention management. In: Proceedings of 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 79–90

  20. Minh CC, Chung J, Kozyrakis C, Olukotun K (2009) STAMP: Stanford transactional applications for multi-processing. In: Proceedings of The IEEE International Symposium on Workload Characterization, pp 35–46

  21. Moravan MJ, Bobba J, Moore KE, Yen L, Hill MD, Liblit B, Swift MM, Wood DA (2006) Supporting nested transactional memory in LogTM. In: Proceedings of the 12th international conference on Architectural Support for Programming Languages and Operating Systems, pp 359–370

  22. Nakaike T, Odaira R, Gaudet M, Michael MM, Tomari H (2015) Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 144–157

  23. Negi A, Armejach A, Cristal A, Unsal OS, Stenstrom P (2012) Transactional prefetching: narrowing the window of contention in hardware transactional memory. In: Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pp 181–190

  24. Negi A, Walliullah M, Stenstrom P (2010) Lv*: a low complexity lazy versioning htm infrastructure. In: Proceedings of the 25th International Conference on Embeded Computer Systems: Architectures, Modeling, and Simulation, pp 231–240

  25. Nguyen D, Pingali K (2017) What scalable programs need from transactional memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp 105–118

  26. Ritson C, Barnes F (2013) An evaluation of intel’s restricted transactional memory for cpas. Commun Process Arch 2013:271–292

    Google Scholar 

  27. Sui Y, Xue J (2016) SVF: interprocedural static value-flow analysis in LLVM. In: Proceedings of the 25th International Conference on Compiler Construction, pp 265–266

  28. Sui Y, Ye D, Xue J (2014) Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Trans Softw Eng 40(2):107–122

    Article  Google Scholar 

  29. Titos-Gil R, Fernández-Pascual R, Ros A, Acacio ME (2020) PfTouch: Concurrent page-fault handling for Intel restricted transactional memory. J Parallel Distrib Comput 145:111–123

    Article  Google Scholar 

  30. Tran KA, Carlson TE, Koukos K, Själander M, Spiliopoulos V, Kaxiras S, Jimborean A (2017) Clairvoyance: look-ahead compile-time scheduling. In: Proceedings of the 2017 International Symposium on Code Generation and Optimization, pp 171–184

  31. Wang Q, Su P, Chabbi M, Liu X (2019) Lightweight hardware transactional memory profiling. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, pp 186–200

  32. Weiser M (1981) Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp 439–449

  33. Weiser M (1984) Program slicing. IEEE Trans Softw Eng 10:352–357

    Article  Google Scholar 

  34. Xiang L, Scott ML (2015) Conflict reduction in hardware transactions using advisory locks. In: Proceedings of the Symposium on Parallelism in Algorithms and Architectures, pp 234–243

  35. Yoo R, Hughes C, Lai K, Rajwar R (2013) Performance evaluation of Intel transactional synchronization extensions for high performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–11

Download references

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 819134), the Spanish MCIU and AEI, as well as the European Commission FEDER funds, under grant RTI2018-098156-B-C53, and the Swedish VR grant number 2016-05086.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Ros.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shimchenko, M., Titos-Gil, R., Fernández-Pascual, R. et al. Analysing software prefetching opportunities in hardware transactional memory. J Supercomput 78, 919–944 (2022). https://doi.org/10.1007/s11227-021-03897-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03897-z

Keywords

Navigation