research-article

Hiding the Long Latency of Persist Barriers Using Speculative Execution

Authors:

Yan SolihinAuthors Info & Claims

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Pages 175 - 186

https://doi.org/10.1145/3079856.3080240

Published: 24 June 2017 Publication History

Abstract

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions.

In this work, we describe how these instructions work and how they can be used to implement write-ahead logging based transactions. We implement several common data structures and kernels and evaluate the performance overhead incurred over traditional non-persistent implementations. In particular, we find that persistence instructions occur in clusters along with expensive fence operations, they have long latency, and they add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists.

To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%.

References

[1]

Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. In IEEE, Vol. 98, Issue: 12. 2237--2251.

[2]

C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. 2005. Unbounded Transactional Memory. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA '05). IEEE Computer Society, Washington, DC, USA, 316--327.

Digital Library

[3]

ARM. 2016. ARMv8-A architecture evolution. (January 2016). https://community.arm.com/groups/processors/blog/2016/01/05/armv8-a-architecture-evolution.

[4]

NVM Library Team at Intel. 2016. Persistent Memory Programming. (August 2016). http://pmem.io.

[5]

Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-Aware Management of NVM-based Memory Extensions. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 9, 12 pages.

Digital Library

[6]

Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 263--276.

Digital Library

[7]

Colin Blundell, Milo M.K. Martin, and Thomas F. Wenisch. 2009. InvisiFence: Performance-transparent Memory Ordering in Conventional Multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 233--244.

Digital Library

[8]

Luis Ceze, Karin Strauss, James Tuck, Josep Torrellas, and Jose Renau. 2006. CAVA: Using Checkpoint-assisted Value Prediction to Hide L2 Misses. ACM Trans. Archit. Code Optim. 3, 2 (June 2006), 182--208.

Digital Library

[9]

Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk Enforcement of Sequential Consistency. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07). ACM, New York, NY, USA, 278--289.

Digital Library

[10]

Marcelo Cintra, José F. Martínez, and Josep Torrellas. 2000. Architectural Support for Scalable Speculative Parallelization in Shared-memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 13--24.

Digital Library

[11]

Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 105--118.

Digital Library

[12]

Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O Through Byte-addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 133--146.

Digital Library

[13]

Intel Corp. 2016. Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3A. Intel.

[14]

Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan, and Konrad Lai. 2005. Scalable Load and Store Processing in Latency Tolerant Processors. In Proceedings of the 32Nd Annual International Symposium on Computer Architecture (ISCA '05). IEEE Computer Society, Washington, DC, USA, 446--457.

Digital Library

[15]

Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. 1991. Two Techniques to Enhance the Performance of Memory Consistency Models. In In Proceedings of the 1991 International Conference on Parallel Processing. 355--364.

[16]

Chris Gniady and Babak Falsafi. 2002. Speculative Sequential Consistency with Little Custom Storage. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT '02). IEEE Computer Society, Washington, DC, USA, 179--188. http://dl.acm.org/citation.cfm?id=645989.674317

Digital Library

[17]

Chris Gniady, Babak Falsafi, and T. N. Vijaykumar. 1999. Is SC + ILP = RC?. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA '99). IEEE Computer Society, Washington, DC, USA, 162--171.

Digital Library

[18]

Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, and Kunle Olukotun. 2004. Programming with Transactional Coherence and Consistency (TCC). In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). ACM, New York, NY, USA, 1--13.

Digital Library

[19]

Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA '93: Proceedings of the 20th Annual International Symposium on Computer Architecture. ACM Press, New York, NY, USA, 289--300.

Digital Library

[20]

Intel. 2016. Deprecate PCommit Instruction. (September 2016). https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction.

[21]

Intel and Micron. 2015. Intel and Micron Produce Breakthrough Memory Technology. (Jul. 2015). https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology.

[22]

Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient Persist Barriers for Multicores. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 660--671.

Digital Library

[23]

T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno. 2007. 2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read. In IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[24]

N. Kirman, M. Kirman, M. Chaudhuri, and J. F. Martinez. 2005. Checkpointed Early Load Retirement. In Proceedings of the 11th International Symposium on High Performance Computer Architecture.

Digital Library

[25]

Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-Performance Transactions for Persistent Memories. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 399--411.

Digital Library

[26]

Mark H. Kryder and Chang Soo Kim. 2009. After Hard Drives -- What Comes Next? IEEE Transactions on Magnetics, Vol. 45, Issue: 10, 3406--3413.

[27]

Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-effcient main memory alternative. In IEEE International Symposium on Performance Analysis of Systems and Software.

[28]

Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting Phase Change Memory As a Scalable Dram Alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 2--13.

Digital Library

[29]

Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase-Change Technology and the Future of Main Memory. IEEE Micro 30, 1 (Jan. 2010), 143--143.

Digital Library

[30]

Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2014. Loose-Ordering Consistency for persistent memory. In Computer Design, 2014 32nd IEEE International Conference on (ICCD'14).

[31]

José F. Martínez and Josep Torrellas. 2002. Speculative Synchronization: Applying Thread-level Speculation to Explicitly Parallel Applications. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). ACM, New York, NY, USA, 18--29.

Digital Library

[32]

C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A Transaction Recovery Method Supporting Fine-granularity Locking and Partial Rollbacks Using Write-ahead Logging. ACM Trans. Database Syst. 17, 1 (March 1992), 94--162.

Digital Library

[33]

Iulian Moraru, David G. Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. 2013. Consistent, Durable, and Safe Memory Management for Byte-addressable Non Volatile Main Memory. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS '13). ACM, New York, NY, USA, Article 1, 17 pages.

Digital Library

[34]

Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. 2003. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA '03). IEEE Computer Society, Washington, DC, USA, 129--. http://dl.acm.org/citation.cfm?id=822080.822823

Digital Library

[35]

Vijay S. Pai, Parthasarathy Ranganathan, Sarita V. Adve, and Tracy Harton. 1996. An Evaluation of Memory Consistency Models for Shared-memory Systems with ILP Processors. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII). ACM, New York, NY, USA, 12--23.

Digital Library

[36]

Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSSx86: A Full System Simulator for x86 CPUs. In Design Automation Conference.

Digital Library

[37]

Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory Persistency. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 265--276.

Digital Library

[38]

Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hami-douche, Md. Wasi-ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. MIC-Check: A Distributed Check Pointing Framework for the Intel Many Integrated Cores Architecture. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14). ACM, New York, NY, USA, 121--124.

Digital Library

[39]

Ravi Rajwar and James R. Goodman. 2001. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 34). IEEE Computer Society, Washington, DC, USA, 294--305. http://dl.acm.org/citation.cfm?id=563998.564036

Digital Library

[40]

Parthasarathy Ranganathan, Vijay S. Pai, and Sarita V. Adve. 1997. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '97). ACM, New York, NY, USA, 199--210.

Digital Library

[41]

Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. 2015. ThyNVM: Enabling Software-transparent Crash Consistency in Persistent Memory Systems. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 672--685.

Digital Library

[42]

Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, and Mike Upton. 2004. Continual Flow Pipelines. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). ACM, New York, NY, USA, 107--119.

Digital Library

[43]

J. Steffan and T Mowry. 1998. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA '98). IEEE Computer Society, Washington, DC, USA, 2--. http://dl.acm.org/citation.cfm?id=822079.822712

Digital Library

[44]

J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A Scalable Approach to Thread-level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 1--12.

Digital Library

[45]

Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 91--104.

Digital Library

[46]

Chundong Wang, Qingsong Wei, Jun Yang, Cheng Chen, and Mingdi Xue. 2015. How to Be Consistent with Persistent Memory? An Evaluation Approach. In IEEE International Conference on Networking, Architecture and Storage (NAS'15).

[47]

Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2007. Mechanisms for Store-wait-free Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07). ACM, New York, NY, USA, 266--277.

Digital Library

[48]

Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 167--181. http://dl.acm.org/citation.cfm?id=2750482.2750495

Digital Library

[49]

Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: Closing the Performance Gap Between Systems with and Without Persistence Support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421--432.

Digital Library

Cited By

Ye CChen MJiang QWang C(2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3607473
Chen ZHu DChe WSun JChen H(2024)A quantitative evaluation of persistent memory hash indexesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00812-133:2(375-397)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00778-023-00812-1
Soh YSwanson SZhao J(2023)ENTS: Flush-and-Fence-Free Failure Atomic TransactionsProceedings of the International Symposium on Memory Systems10.1145/3631882.3631907(1-16)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631907
Show More Cited By

Index Terms

Hiding the Long Latency of Persist Barriers Using Speculative Execution
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

Proteus: a flexible and fast software supported hardware logging approach for NVM
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Emerging non-volatile memory (NVM) technologies, such as phase-change memory, spin-transfer torque magnetic memory, memristor, and 3D Xpoint, are encouraging the development of new architectures that support the challenges of persistent programming. An ...
Hiding the Long Latency of Persist Barriers Using Speculative Execution
ISCA'17

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the ...
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference

Active research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

June 2017

736 pages

ISBN:9781450348928

DOI:10.1145/3079856

ACM SIGARCH Computer Architecture News Volume 45, Issue 2
ISCA'17
May 2017
715 pages
ISSN:0163-5964
DOI:10.1145/3140659
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ISCA '17

Sponsor:

IEEE
SIGARCH

ISCA '17: The 44th Annual International Symposium on Computer Architecture

June 24 - 28, 2017

ON, Toronto, Canada

Acceptance Rates

ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
872
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ye CChen MJiang QWang C(2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3607473
Chen ZHu DChe WSun JChen H(2024)A quantitative evaluation of persistent memory hash indexesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00812-133:2(375-397)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00778-023-00812-1
Soh YSwanson SZhao J(2023)ENTS: Flush-and-Fence-Free Failure Atomic TransactionsProceedings of the International Symposium on Memory Systems10.1145/3631882.3631907(1-16)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631907
Freij AZhou HSolihin Y(2023)SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071082(677-690)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071082
Lu ZYue JDeng YZhu Y(2022)Accelerate Hardware Logging for Efficient Crash Consistency in Persistent Memory2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774633(388-393)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774633
Gogte VKolli AWenisch T(2022)A Primer on Memory PersistencySynthesis Lectures on Computer Architecture10.2200/S011157ED1V01Y202201CAC05817:1(1-115)Online publication date: 9-Feb-2022
https://doi.org/10.2200/S011157ED1V01Y202201CAC058
Xu YYe CSolihin YShen XSalapura VZahran MChong FTang L(2022)FFCCDProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527406(274-288)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527406
Deng YYue JLu ZZhu Y(2021)Efficient Hardware-assisted Out-place Update for Persistent Memory2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474136(507-512)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474136
Han XTuck JAwad A(2021)Dolos: Improving the Performance of Persistent Applications in ADR-Supported Secure MemoryMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480118(1241-1253)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480118
Vemmou MDaglis A(2021)COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480075(86-99)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480075
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten