ABSTRACT
Emerging non-volatile memory technologies enable fast, fine-grained persistence compared to slow block-based devices. In order to ensure consistency of persistent state, dirty cache lines need to be periodically flushed from caches and made persistent in an order specified by the persistency model. A persist barrier is one mechanism for enforcing this ordering.
In this paper, we first show that current persist barrier implementations, flowing to certain ordering dependencies, add cache line flushes to the critical path. Our main contribution is an efficient persist barrier, that reduces the number of cache line ushes happening in the critical path. We evaluate our proposed persist barrier by using it to enforce two persistency models: buffered epoch persistency with programmer inserted barriers; and buffered strict persistency in bulk mode with hardware inserted barriers. Experimental evaluations using micro-benchmarks (buffered epoch persistency) and multi-threaded workloads (buffered strict persistency) show that using our persist barrier improves performance by 22% and 20% respectively over the state-of-the-art.
- S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson, "System software for persistent memory," in Proceedings of the 9th European Conference on Computer Systems, ACM, 2014. Google ScholarDigital Library
- Intel Corporation, Intel® Architecture Instruction Set Extensions Programming Reference. No. 319433-022, 2014.Google Scholar
- H. Volos, A. J. Tack, and M. M. Swift, "Mnemosyne: Lightweight persistent memory," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2011. Google ScholarDigital Library
- J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson, "Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2011. Google ScholarDigital Library
- X. Wu and A. L. N. Reddy, "Scmfs: A file system for storage class memory," in Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2011. Google ScholarDigital Library
- D. R. Chakrabarti, H.-J. Boehm, and K. Bhandari, "Atlas: Leveraging locks for non-volatile memory consistency," in Proceedings of the International Conference on Object Oriented Programming Systems Languages & Applications, ACM, 2014. Google ScholarDigital Library
- A. Chatzistergiou, M. Cintra, and S. D. Viglas, "Rewind: Recovery write-ahead system for in-memory non-volatile data-structures," Proceedings of VLDB Endowment, vol. 8, no. 5, 2015. Google ScholarDigital Library
- S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," in Proceedings of the 41st Annual International Symposium on Computer Architecture, IEEE, 2014. Google ScholarDigital Library
- J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee, "Better i/o through byte-addressable, persistent memory," in Proceedings of the 22nd Symposium on Operating Systems Principles, ACM, 2009. Google ScholarDigital Library
- D. R. Chakrabarti and H.-J. Boehm, "Durability semantics for lock-based multithreaded programs," in Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism, USENIX, 2013.Google Scholar
- L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "Bulksc: Bulk enforcement of sequential consistency," in Proceedings of the 34th Annual International Symposium on Computer Architecture, ACM, 2007. Google ScholarDigital Library
- D. Narayanan and O. Hodson, "Whole-system persistence with non-volatile memories," in Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2012. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, 2011. Google ScholarDigital Library
- N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha, "Garnet: A detailed on-chip network model inside a full-system simulator," in Proceedings of International Symposium on Performance Analysis of Systems and Software, IEEE, 2009.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The parsec benchmark suite: Characterization and architectural implications," in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, ACM, 2008. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd Annual International Symposium on Computer Architecture, ACM, 1995. Google ScholarDigital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun, "Stamp: Stanford transactional applications for multiprocessing," in Proceedings of the 4th International Symposium on Workload Characterization, IEEE, 2008.Google Scholar
- Y. Lu, J. Shu, L. Sun, and O. Mutlu, "Loose-ordering consistency for persistent memory," in Proceedings of the 32nd International Conference on Computer Design, IEEE, 2014.Google Scholar
- J. Zhao, S. Li, D. H. Yoon, Y. Xie, and N. P. Jouppi, "Kiln: Closing the performance gap between systems with and without persistence support," in Proceedings of the 46th Annual International Symposium on Microarchitecture, ACM, 2013. Google ScholarDigital Library
- R.-S. Liu, D.-Y. Shen, C.-L. Yang, S.-C. Yu, and C.-Y. M. Wang, "Nvm duet: Unified working memory and persistent store architecture," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2014. Google ScholarDigital Library
- J. Zhao, O. Mutlu, and Y. Xie, "Firm: Fair and high-performance memory control for persistent memory systems," in Proceedings of the 47th Annual International Symposium on Microarchitecture, IEEE Computer Society, 2014. Google ScholarDigital Library
- L. Sun, Y. Lu, and J. Shu, "Dp2: Reducing transaction overhead with differential and dual persistency in persistent memory," in Proceedings of the 12th International Conference on Computing Frontiers, ACM, 2015. Google ScholarDigital Library
- F. Nawab, D. R. Chakrabarti, T. Kelly, and C. B. M. III, "Procrastination beats prevention: Timely sufficient persistence for efficient crash resilience," in Proceedings of the 18th International Conference on Extending Database Technology, 2015.Google Scholar
- S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge, "Storage management in the nvram era," Proceedings of VLDB Endowment, vol. 7, no. 2, 2013. Google ScholarDigital Library
Recommendations
Shared caches in multicores: the good, the bad, and the ugly
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureAs we transition from clock-frequency performance scaling to performance scaling with multicores, the pressure on the memory hierarchy is increasing dramatically. Many different on-chip cache topologies have been proposed/implemented; effective ...
Locality-aware data replication in the last-level cache for large scale multicores
Next generation large single-chip multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of on-chip cache and network resources is of fundamental importance. We propose a ...
CAFFEINE: A Utility-Driven Prefetcher Aggressiveness Engine for Multicores
Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall system ...
Comments