Analysing software prefetching opportunities in hardware transactional memory

Shimchenko, Marina; Titos-Gil, Rubén; Fernández-Pascual, Ricardo; Acacio, Manuel E.; Kaxiras, Stefanos; Ros, Alberto; Jimborean, Alexandra

doi:10.1007/s11227-021-03897-z

Analysing software prefetching opportunities in hardware transactional memory

Published: 02 June 2021

Volume 78, pages 919–944, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

352 Accesses
Explore all metrics

Abstract

Hardware transactional memory emerged to make parallel programming more accessible. However, the performance pitfall of this technique is squashing speculatively executed instructions and re-executing them in case of aborts, ultimately resorting to serialization in case of repeated conflicts. A significant fraction of aborts occurs due to conflicts (concurrent reads and writes to the same memory location performed by different threads). Our proposal aims to reduce conflict aborts by reducing the window of time during which transactional regions can suffer conflicts. We achieve this by using software prefetching instructions inserted automatically at compile-time. Through these prefetch instructions, we intend to bring the necessary data for each transaction from the main memory to the cache before the transaction itself starts to execute, thus converting the otherwise long latency cache misses into hits during the execution of the transaction. The obtained results show that our approach decreases the number of aborts by 30% on average and improves performance by up to 19% and 10% for two out of the eight evaluated benchmarks. We provide insights into when our technique is beneficial given certain characteristics of the transactional regions, the advantages and disadvantages of our approach, and finally, discuss potential solutions to overcome some of its limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-preemptive Scheduling of Real-Time Software Transactional Memory

A survey on optimizations towards best-effort hardware transactional memory

Article 15 September 2020

Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories

Notes

If a function is called in two different transactions, we create one AP version for each call context. AP versions are transaction specific because the selection of the instructions for each AP depends on how the memory updates performed within the function affect its callers.

References

Ansari M, Khan B, Luján M, Kotselidis C, Kirkham C, Watson I (2010) Improving performance by reducing aborts in hardware transactional memory. In: High Performance Embedded Architectures and Compilers, pp 35–49
Ansari M, Luján M, Kotselidis C, Jarvis K, Kirkham C, Watson I (2009) Steal-on-abort: improving transactional memory performance through dynamic transaction reordering. In: Proceedings of the High Performance Embedded Architectures and Compilers, pp 4–18
ARM Ltd Transactional Memory Extension (TME) intrinsics. https://developer.arm.com/documentation/101028/0011/Transactional-Memory-Extension--TME--intrinsics
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. Comput Arch News 39(2):1–7
Article Google Scholar
Dash A, Demsky B (2010) Automatically generating symbolic prefetches for distributed transactional memories. In: Middleware 2010. Lecture Notes in Computer Science, vol 6452
Dash A, Demsky B (2011) Integrating caching and prefetching mechanisms in a distributed transactional memory. IEEE Trans Parallel Distrib Syst 22(8):1284–1298
Article Google Scholar
Dice D, Herlihy M, Kogan A (2018) Improving parallelism in hardware transactional memory. ACM Trans Arch Code Optim 15(1):1–24
Article Google Scholar
Diegues N, Romano P (2014) Time-warp: lightweight abort minimization in transactional memory. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming, pp 167–178
Diegues N, Romano P, Garbatov S (2017) Seer: probabilistic scheduling for hardware transactional memory. ACM Trans Comput Syst 35(3):1–41
Article Google Scholar
Dragojevic A, Guerraoui R (2010) Predicting the scalability of an stm. In: 5th ACM SIGPLAN Workshop on Transactional Computing
Harris T, Larus J, Rajwar R (2010) Transactional Memory, 2nd edn. Morgan & Claypool Publishers Series
Jacobi C, Slegel T, Greiner D (2012) Transactional memory architecture and implementation for IBM system Z. In: Proceedings of the International Symposium on Microarchitecture, pp 25–36
Jimborean A, Koukos K, Spiliopoulos V, Black-Schaffer D, Kaxiras S (2014) Fix the code. Don’t tweak the hardware: a new compiler approach to voltage-frequency scaling. In: Proceedings of the International Symposium on Code Generation and Optimization, pp 262–272
Koukos K, Ekemark P, Zacharopoulos G, Spiliopoulos V, Kaxiras S, Jimborean A (2016) Daedal decoupled access-execute LLVM tools repository. https://github.com/etascale/daedal
Koukos K, Ekemark P, Zacharopoulos G, Spiliopoulos V, Kaxiras S, Jimborean A (2016) Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs. In: Proceedings of the 25th International Conference on Compiler Construction, pp 121–131
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization, pp 75–88
Le HQ, Guthrie GL, Williams DE, Michael MM, Frey BG, Starke WJ, May C, Odaira R, Nakaike T (2015) Transactional memory support in the IBM POWER8 processor. IBM J Res Dev 59(1):8:1-8:14
Article Google Scholar
Litz H, Cheriton D, Firoozshahian A, Azizi O, Stevenson JP (2014) Si-TM: reducing transactional memory abort rates through snapshot isolation. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 383–398
Maldonado W, Marlier P, Felber P, Suissa A, Hendler D, Fedorova A, Lawall JL, Muller G (2009) Scheduling support for transactional memory contention management. In: Proceedings of 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 79–90
Minh CC, Chung J, Kozyrakis C, Olukotun K (2009) STAMP: Stanford transactional applications for multi-processing. In: Proceedings of The IEEE International Symposium on Workload Characterization, pp 35–46
Moravan MJ, Bobba J, Moore KE, Yen L, Hill MD, Liblit B, Swift MM, Wood DA (2006) Supporting nested transactional memory in LogTM. In: Proceedings of the 12th international conference on Architectural Support for Programming Languages and Operating Systems, pp 359–370
Nakaike T, Odaira R, Gaudet M, Michael MM, Tomari H (2015) Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 144–157
Negi A, Armejach A, Cristal A, Unsal OS, Stenstrom P (2012) Transactional prefetching: narrowing the window of contention in hardware transactional memory. In: Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pp 181–190
Negi A, Walliullah M, Stenstrom P (2010) Lv*: a low complexity lazy versioning htm infrastructure. In: Proceedings of the 25th International Conference on Embeded Computer Systems: Architectures, Modeling, and Simulation, pp 231–240
Nguyen D, Pingali K (2017) What scalable programs need from transactional memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp 105–118
Ritson C, Barnes F (2013) An evaluation of intel’s restricted transactional memory for cpas. Commun Process Arch 2013:271–292
Google Scholar
Sui Y, Xue J (2016) SVF: interprocedural static value-flow analysis in LLVM. In: Proceedings of the 25th International Conference on Compiler Construction, pp 265–266
Sui Y, Ye D, Xue J (2014) Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Trans Softw Eng 40(2):107–122
Article Google Scholar
Titos-Gil R, Fernández-Pascual R, Ros A, Acacio ME (2020) PfTouch: Concurrent page-fault handling for Intel restricted transactional memory. J Parallel Distrib Comput 145:111–123
Article Google Scholar
Tran KA, Carlson TE, Koukos K, Själander M, Spiliopoulos V, Kaxiras S, Jimborean A (2017) Clairvoyance: look-ahead compile-time scheduling. In: Proceedings of the 2017 International Symposium on Code Generation and Optimization, pp 171–184
Wang Q, Su P, Chabbi M, Liu X (2019) Lightweight hardware transactional memory profiling. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, pp 186–200
Weiser M (1981) Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp 439–449
Weiser M (1984) Program slicing. IEEE Trans Softw Eng 10:352–357
Article Google Scholar
Xiang L, Scott ML (2015) Conflict reduction in hardware transactions using advisory locks. In: Proceedings of the Symposium on Parallelism in Algorithms and Architectures, pp 234–243
Yoo R, Hughes C, Lai K, Rajwar R (2013) Performance evaluation of Intel transactional synchronization extensions for high performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–11

Download references

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 819134), the Spanish MCIU and AEI, as well as the European Commission FEDER funds, under grant RTI2018-098156-B-C53, and the Swedish VR grant number 2016-05086.

Author information

Authors and Affiliations

Department of Computing Systems, Uppsala University, Uppsala, Sweden
Marina Shimchenko, Stefanos Kaxiras & Alexandra Jimborean
Computer Engeneering Department, University of Murcia, Murcia, Spain
Rubén Titos-Gil, Ricardo Fernández-Pascual, Manuel E. Acacio, Alberto Ros & Alexandra Jimborean

Authors

Marina Shimchenko
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Titos-Gil
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Fernández-Pascual
View author publications
You can also search for this author in PubMed Google Scholar
Manuel E. Acacio
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Kaxiras
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Ros
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Jimborean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Ros.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shimchenko, M., Titos-Gil, R., Fernández-Pascual, R. et al. Analysing software prefetching opportunities in hardware transactional memory. J Supercomput 78, 919–944 (2022). https://doi.org/10.1007/s11227-021-03897-z

Download citation

Accepted: 13 May 2021
Published: 02 June 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11227-021-03897-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysing software prefetching opportunities in hardware transactional memory

Abstract

Access this article

Similar content being viewed by others

Non-preemptive Scheduling of Real-Time Software Transactional Memory

A survey on optimizations towards best-effort hardware transactional memory

Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysing software prefetching opportunities in hardware transactional memory

Abstract

Access this article

Similar content being viewed by others

Non-preemptive Scheduling of Real-Time Software Transactional Memory

A survey on optimizations towards best-effort hardware transactional memory

Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation