research-article

Efficient persist barriers for multicores

Authors:
Arpit Joshi

University of Edinburgh

University of Edinburgh
View Profile

,
Vijay Nagarajan

University of Edinburgh

University of Edinburgh
View Profile

,
Marcelo Cintra

Intel, Germany

Intel, Germany
View Profile

,
Stratis Viglas

University of Edinburgh

University of Edinburgh
View Profile

MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureDecember 2015Pages 660–671https://doi.org/10.1145/2830772.2830805

Published:05 December 2015Publication History

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Pages 660–671

ABSTRACT

Emerging non-volatile memory technologies enable fast, fine-grained persistence compared to slow block-based devices. In order to ensure consistency of persistent state, dirty cache lines need to be periodically flushed from caches and made persistent in an order specified by the persistency model. A persist barrier is one mechanism for enforcing this ordering.

In this paper, we first show that current persist barrier implementations, flowing to certain ordering dependencies, add cache line flushes to the critical path. Our main contribution is an efficient persist barrier, that reduces the number of cache line ushes happening in the critical path. We evaluate our proposed persist barrier by using it to enforce two persistency models: buffered epoch persistency with programmer inserted barriers; and buffered strict persistency in bulk mode with hardware inserted barriers. Experimental evaluations using micro-benchmarks (buffered epoch persistency) and multi-threaded workloads (buffered strict persistency) show that using our persist barrier improves performance by 22% and 20% respectively over the state-of-the-art.

References

S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson, "System software for persistent memory," in Proceedings of the 9th European Conference on Computer Systems, ACM, 2014. Google ScholarDigital Library
Intel Corporation, Intel® Architecture Instruction Set Extensions Programming Reference. No. 319433-022, 2014.Google Scholar
H. Volos, A. J. Tack, and M. M. Swift, "Mnemosyne: Lightweight persistent memory," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2011. Google ScholarDigital Library
J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson, "Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2011. Google ScholarDigital Library
X. Wu and A. L. N. Reddy, "Scmfs: A file system for storage class memory," in Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2011. Google ScholarDigital Library
D. R. Chakrabarti, H.-J. Boehm, and K. Bhandari, "Atlas: Leveraging locks for non-volatile memory consistency," in Proceedings of the International Conference on Object Oriented Programming Systems Languages & Applications, ACM, 2014. Google ScholarDigital Library
A. Chatzistergiou, M. Cintra, and S. D. Viglas, "Rewind: Recovery write-ahead system for in-memory non-volatile data-structures," Proceedings of VLDB Endowment, vol. 8, no. 5, 2015. Google ScholarDigital Library
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," in Proceedings of the 41st Annual International Symposium on Computer Architecture, IEEE, 2014. Google ScholarDigital Library
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee, "Better i/o through byte-addressable, persistent memory," in Proceedings of the 22nd Symposium on Operating Systems Principles, ACM, 2009. Google ScholarDigital Library
D. R. Chakrabarti and H.-J. Boehm, "Durability semantics for lock-based multithreaded programs," in Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism, USENIX, 2013.Google Scholar
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "Bulksc: Bulk enforcement of sequential consistency," in Proceedings of the 34th Annual International Symposium on Computer Architecture, ACM, 2007. Google ScholarDigital Library
D. Narayanan and O. Hodson, "Whole-system persistence with non-volatile memories," in Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2012. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, 2011. Google ScholarDigital Library
N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha, "Garnet: A detailed on-chip network model inside a full-system simulator," in Proceedings of International Symposium on Performance Analysis of Systems and Software, IEEE, 2009.Google Scholar
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The parsec benchmark suite: Characterization and architectural implications," in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, ACM, 2008. Google ScholarDigital Library
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd Annual International Symposium on Computer Architecture, ACM, 1995. Google ScholarDigital Library
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun, "Stamp: Stanford transactional applications for multiprocessing," in Proceedings of the 4th International Symposium on Workload Characterization, IEEE, 2008.Google Scholar
Y. Lu, J. Shu, L. Sun, and O. Mutlu, "Loose-ordering consistency for persistent memory," in Proceedings of the 32nd International Conference on Computer Design, IEEE, 2014.Google Scholar
J. Zhao, S. Li, D. H. Yoon, Y. Xie, and N. P. Jouppi, "Kiln: Closing the performance gap between systems with and without persistence support," in Proceedings of the 46th Annual International Symposium on Microarchitecture, ACM, 2013. Google ScholarDigital Library
R.-S. Liu, D.-Y. Shen, C.-L. Yang, S.-C. Yu, and C.-Y. M. Wang, "Nvm duet: Unified working memory and persistent store architecture," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2014. Google ScholarDigital Library
J. Zhao, O. Mutlu, and Y. Xie, "Firm: Fair and high-performance memory control for persistent memory systems," in Proceedings of the 47th Annual International Symposium on Microarchitecture, IEEE Computer Society, 2014. Google ScholarDigital Library
L. Sun, Y. Lu, and J. Shu, "Dp2: Reducing transaction overhead with differential and dual persistency in persistent memory," in Proceedings of the 12th International Conference on Computing Frontiers, ACM, 2015. Google ScholarDigital Library
F. Nawab, D. R. Chakrabarti, T. Kelly, and C. B. M. III, "Procrastination beats prevention: Timely sufficient persistence for efficient crash resilience," in Proceedings of the 18th International Conference on Extending Database Technology, 2015.Google Scholar
S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge, "Storage management in the nvram era," Proceedings of VLDB Endowment, vol. 7, no. 2, 2013. Google ScholarDigital Library

Recommendations

Shared caches in multicores: the good, the bad, and the ugly
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

As we transition from clock-frequency performance scaling to performance scaling with multicores, the pressure on the memory hierarchy is increasing dramatically. Many different on-chip cache topologies have been proposed/implemented; effective ...
Read More
Locality-aware data replication in the last-level cache for large scale multicores

Next generation large single-chip multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of on-chip cache and network resources is of fundamental importance. We propose a ...
Read More
CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall system ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
December 2015
787 pages
ISBN:9781450340342
DOI:10.1145/2830772
General Chair:
Milos Prvulovic
Georgia Tech
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data persistence
multicore
non-voaltile memory
persist barrier
Qualifiers
- research-article
Conference

Acceptance Rates
MICRO-48 Paper Acceptance Rate61of283submissions,22%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 419
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient persist barriers for multicores

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Shared caches in multicores: the good, the bad, and the ugly

Locality-aware data replication in the last-level cache for large scale multicores

CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient persist barriers for multicores

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Shared caches in multicores: the good, the bad, and the ugly

Locality-aware data replication in the last-level cache for large scale multicores

CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media