skip to main content
10.1145/3079856.3080240acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Hiding the Long Latency of Persist Barriers Using Speculative Execution

Published: 24 June 2017 Publication History

Abstract

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions.
In this work, we describe how these instructions work and how they can be used to implement write-ahead logging based transactions. We implement several common data structures and kernels and evaluate the performance overhead incurred over traditional non-persistent implementations. In particular, we find that persistence instructions occur in clusters along with expensive fence operations, they have long latency, and they add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists.
To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%.

References

[1]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. In IEEE, Vol. 98, Issue: 12. 2237--2251.
[2]
C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. 2005. Unbounded Transactional Memory. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA '05). IEEE Computer Society, Washington, DC, USA, 316--327.
[3]
ARM. 2016. ARMv8-A architecture evolution. (January 2016). https://community.arm.com/groups/processors/blog/2016/01/05/armv8-a-architecture-evolution.
[4]
NVM Library Team at Intel. 2016. Persistent Memory Programming. (August 2016). http://pmem.io.
[5]
Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-Aware Management of NVM-based Memory Extensions. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 9, 12 pages.
[6]
Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 263--276.
[7]
Colin Blundell, Milo M.K. Martin, and Thomas F. Wenisch. 2009. InvisiFence: Performance-transparent Memory Ordering in Conventional Multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 233--244.
[8]
Luis Ceze, Karin Strauss, James Tuck, Josep Torrellas, and Jose Renau. 2006. CAVA: Using Checkpoint-assisted Value Prediction to Hide L2 Misses. ACM Trans. Archit. Code Optim. 3, 2 (June 2006), 182--208.
[9]
Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk Enforcement of Sequential Consistency. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07). ACM, New York, NY, USA, 278--289.
[10]
Marcelo Cintra, José F. Martínez, and Josep Torrellas. 2000. Architectural Support for Scalable Speculative Parallelization in Shared-memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 13--24.
[11]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 105--118.
[12]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O Through Byte-addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 133--146.
[13]
Intel Corp. 2016. Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3A. Intel.
[14]
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan, and Konrad Lai. 2005. Scalable Load and Store Processing in Latency Tolerant Processors. In Proceedings of the 32Nd Annual International Symposium on Computer Architecture (ISCA '05). IEEE Computer Society, Washington, DC, USA, 446--457.
[15]
Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. 1991. Two Techniques to Enhance the Performance of Memory Consistency Models. In In Proceedings of the 1991 International Conference on Parallel Processing. 355--364.
[16]
Chris Gniady and Babak Falsafi. 2002. Speculative Sequential Consistency with Little Custom Storage. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT '02). IEEE Computer Society, Washington, DC, USA, 179--188. http://dl.acm.org/citation.cfm?id=645989.674317
[17]
Chris Gniady, Babak Falsafi, and T. N. Vijaykumar. 1999. Is SC + ILP = RC?. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA '99). IEEE Computer Society, Washington, DC, USA, 162--171.
[18]
Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, and Kunle Olukotun. 2004. Programming with Transactional Coherence and Consistency (TCC). In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). ACM, New York, NY, USA, 1--13.
[19]
Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA '93: Proceedings of the 20th Annual International Symposium on Computer Architecture. ACM Press, New York, NY, USA, 289--300.
[20]
Intel. 2016. Deprecate PCommit Instruction. (September 2016). https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction.
[21]
Intel and Micron. 2015. Intel and Micron Produce Breakthrough Memory Technology. (Jul. 2015). https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology.
[22]
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient Persist Barriers for Multicores. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 660--671.
[23]
T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno. 2007. 2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read. In IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[24]
N. Kirman, M. Kirman, M. Chaudhuri, and J. F. Martinez. 2005. Checkpointed Early Load Retirement. In Proceedings of the 11th International Symposium on High Performance Computer Architecture.
[25]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-Performance Transactions for Persistent Memories. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 399--411.
[26]
Mark H. Kryder and Chang Soo Kim. 2009. After Hard Drives -- What Comes Next? IEEE Transactions on Magnetics, Vol. 45, Issue: 10, 3406--3413.
[27]
Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-effcient main memory alternative. In IEEE International Symposium on Performance Analysis of Systems and Software.
[28]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting Phase Change Memory As a Scalable Dram Alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 2--13.
[29]
Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase-Change Technology and the Future of Main Memory. IEEE Micro 30, 1 (Jan. 2010), 143--143.
[30]
Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2014. Loose-Ordering Consistency for persistent memory. In Computer Design, 2014 32nd IEEE International Conference on (ICCD'14).
[31]
José F. Martínez and Josep Torrellas. 2002. Speculative Synchronization: Applying Thread-level Speculation to Explicitly Parallel Applications. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). ACM, New York, NY, USA, 18--29.
[32]
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A Transaction Recovery Method Supporting Fine-granularity Locking and Partial Rollbacks Using Write-ahead Logging. ACM Trans. Database Syst. 17, 1 (March 1992), 94--162.
[33]
Iulian Moraru, David G. Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. 2013. Consistent, Durable, and Safe Memory Management for Byte-addressable Non Volatile Main Memory. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS '13). ACM, New York, NY, USA, Article 1, 17 pages.
[34]
Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. 2003. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA '03). IEEE Computer Society, Washington, DC, USA, 129--. http://dl.acm.org/citation.cfm?id=822080.822823
[35]
Vijay S. Pai, Parthasarathy Ranganathan, Sarita V. Adve, and Tracy Harton. 1996. An Evaluation of Memory Consistency Models for Shared-memory Systems with ILP Processors. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII). ACM, New York, NY, USA, 12--23.
[36]
Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSSx86: A Full System Simulator for x86 CPUs. In Design Automation Conference.
[37]
Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory Persistency. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 265--276.
[38]
Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hami-douche, Md. Wasi-ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. MIC-Check: A Distributed Check Pointing Framework for the Intel Many Integrated Cores Architecture. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14). ACM, New York, NY, USA, 121--124.
[39]
Ravi Rajwar and James R. Goodman. 2001. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 34). IEEE Computer Society, Washington, DC, USA, 294--305. http://dl.acm.org/citation.cfm?id=563998.564036
[40]
Parthasarathy Ranganathan, Vijay S. Pai, and Sarita V. Adve. 1997. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '97). ACM, New York, NY, USA, 199--210.
[41]
Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. 2015. ThyNVM: Enabling Software-transparent Crash Consistency in Persistent Memory Systems. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 672--685.
[42]
Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, and Mike Upton. 2004. Continual Flow Pipelines. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). ACM, New York, NY, USA, 107--119.
[43]
J. Steffan and T Mowry. 1998. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA '98). IEEE Computer Society, Washington, DC, USA, 2--. http://dl.acm.org/citation.cfm?id=822079.822712
[44]
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A Scalable Approach to Thread-level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 1--12.
[45]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 91--104.
[46]
Chundong Wang, Qingsong Wei, Jun Yang, Cheng Chen, and Mingdi Xue. 2015. How to Be Consistent with Persistent Memory? An Evaluation Approach. In IEEE International Conference on Networking, Architecture and Storage (NAS'15).
[47]
Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2007. Mechanisms for Store-wait-free Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07). ACM, New York, NY, USA, 266--277.
[48]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 167--181. http://dl.acm.org/citation.cfm?id=2750482.2750495
[49]
Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: Closing the Performance Gap Between Systems with and Without Persistence Support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421--432.

Cited By

View all
  • (2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
  • (2024)A quantitative evaluation of persistent memory hash indexesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00812-133:2(375-397)Online publication date: 1-Mar-2024
  • (2023)ENTS: Flush-and-Fence-Free Failure Atomic TransactionsProceedings of the International Symposium on Memory Systems10.1145/3631882.3631907(1-16)Online publication date: 2-Oct-2023
  • Show More Cited By

Index Terms

  1. Hiding the Long Latency of Persist Barriers Using Speculative Execution

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
      June 2017
      736 pages
      ISBN:9781450348928
      DOI:10.1145/3079856
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 June 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Failure Safety
      2. Non-Volatile Main Memory
      3. Speculative Persistence

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ISCA '17
      Sponsor:

      Acceptance Rates

      ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
      • (2024)A quantitative evaluation of persistent memory hash indexesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00812-133:2(375-397)Online publication date: 1-Mar-2024
      • (2023)ENTS: Flush-and-Fence-Free Failure Atomic TransactionsProceedings of the International Symposium on Memory Systems10.1145/3631882.3631907(1-16)Online publication date: 2-Oct-2023
      • (2023)SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071082(677-690)Online publication date: Feb-2023
      • (2022)Accelerate Hardware Logging for Efficient Crash Consistency in Persistent Memory2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774633(388-393)Online publication date: 14-Mar-2022
      • (2022)A Primer on Memory PersistencySynthesis Lectures on Computer Architecture10.2200/S011157ED1V01Y202201CAC05817:1(1-115)Online publication date: 9-Feb-2022
      • (2022)FFCCDProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527406(274-288)Online publication date: 18-Jun-2022
      • (2021)Efficient Hardware-assisted Out-place Update for Persistent Memory2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474136(507-512)Online publication date: 1-Feb-2021
      • (2021)Dolos: Improving the Performance of Persistent Applications in ADR-Supported Secure MemoryMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480118(1241-1253)Online publication date: 18-Oct-2021
      • (2021)COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480075(86-99)Online publication date: 18-Oct-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media