skip to main content
10.1145/2908080.2908090acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Remix: online detection and repair of cache contention for the JVM

Published: 02 June 2016 Publication History

Abstract

As ever more computation shifts onto multicore architectures, it is increasingly critical to find effective ways of dealing with multithreaded performance bugs like true and false sharing. Previous approaches to fixing false sharing in unmanaged languages have employed highly-invasive runtime program modifications. We observe that managed language runtimes, with garbage collection and JIT code compilation, present unique opportunities to repair such bugs directly, mirroring the techniques used in manual repairs. We present Remix, a modified version of the Oracle HotSpot JVM which can detect cache contention bugs and repair false sharing at runtime. Remix's detection mechanism leverages recent performance counter improvements on Intel platforms, which allow for precise, unobtrusive monitoring of cache contention at the hardware level. Remix can detect and repair known false sharing issues in the LMAX Disruptor high-performance inter-thread messaging library and the Spring Reactor event-processing framework, automatically providing 1.5-2x speedups over unoptimized code and matching the performance of hand-optimization. Remix also finds a new false sharing bug in SPECjvm2008, and uncovers a true sharing bug in the HotSpot JVM that, when fixed, improves the performance of three NAS Parallel Benchmarks by 7-25x. Remix incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making Remix practical for always-on use.

References

[1]
Ali-Reza Adl-Tabatabai, Richard L. Hudson, Mauricio J. Serrano, and Sreenivas Subramoney. Prefetch Injection Based on Hardware Monitoring and Object Metadata. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI ’04, pages 267–276, 2004.
[2]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovi´c, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 169– 190, 2006.
[3]
Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 1–8, 2010.
[4]
Dries Buytaert, Andy Georges, Michael Hind, Matthew Arnold, Lieven Eeckhout, and Koen De Bosschere. Using HPM-sampling to Drive Dynamic Compilation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA ’07, pages 553–568, 2007.
[5]
Trishul M. Chilimbi and James R. Larus. Using Generational Garbage Collection to Implement Cache-conscious Data Placement. In Proceedings of the 1st International Symposium on Memory Management, ISMM ’98, pages 37–48, 1998.
[6]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Scalable Address Spaces Using RCU Balanced Trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 199–210, 2012.
[7]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. RadixVM: Scalable Address Spaces for Multithreaded Applications. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pages 211–224, 2013.
[8]
Intel Corporation. Avoiding and Identifying False Sharing Among Threads. https://software.intel.com/en-us/articles/ avoiding-and-identifying-false-sharing-among-threads, 2011.
[9]
Intel Corporation. Intel(R) 64 and IA-32 Architectures Software Developer’s Manual, Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B and 3C, 6 2015.
[10]
Oracle Corporation. VisualVM: All-in-One Java Troubleshooting Tool. https://visualvm.java.net/, 2015.
[11]
Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/, 2008.
[12]
Florian David, Gael Thomas, Julia Lawall, and Gilles Muller. Continuously Measuring Critical Section Pressure with the Free-lunch Profiler. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’14, pages 291– 307, 2014.
[13]
David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 37–48, 2004.
[14]
Julian Dolby. Automatic Inline Allocation of Objects. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, PLDI ’97, pages 7–17, 1997.
[15]
Julian Dolby and Andrew Chien. An Automatic Object Inlining Optimization and Its Evaluation. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pages 345–357, 2000.
[16]
Julian Dolby and Andrew A. Chien. An Evaluation of Automatic Object Inline Allocation Techniques. In Proceedings of the 13th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’98, pages 1–20, 1998.
[17]
Apache Software Foundation. Apache Log4j 2 website. http: //logging.apache.org/log4j/2.x/, 2015.
[18]
Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan. Implementation of the NAS Parallel Benchmarks in Java. Technical Report NAS-02-009, NASA Advanced Supercomputing Division, 2002.
[19]
functionaljava.org. functionaljava: A Library for Functional Programming in Java. functionaljava.org, 2010.
[20]
Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley, J Eliot B. Moss, Zhenlin Wang, and Perry Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’04, pages 69–80, 2004.
[21]
L. Hupel and typelevel.org. scalaz: Functional programming for Scala. http://typelevel.org/projects/scalaz/, 2010.
[22]
Shams Imam and Vivek Sarkar. Habanero-Java Library: A Java 8 Framework for Multicore Programming. In Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ ’14, pages 75–86, 2014.
[23]
Shams M. Imam and Vivek Sarkar. Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries. In Proceedings of the 4th International Workshop on Programming Based on Actors, Agents & Decentralized Control, AGERE! ’14, pages 67–80, 2014.
[24]
Ondrej Lhoták and Laurie Hendren. Run-time Evaluation of Opportunities for Object Inlining in Java. In Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande, JGI ’02, pages 175–184, 2002.
[25]
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. The Java Virtual Machine Specification: Java SE 8 Edition, chapter 4.4 The class File Format. Oracle Corporation, 2015.
[26]
C.-L. Liu. False Sharing Analysis for Multithreaded Programs. Master’s thesis, National Chung Cheng University, 7 2009.
[27]
Tongping Liu and Emery D. Berger. SHERIFF: Precise Detection and Automatic Mitigation of False Sharing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’11, pages 3–18, 2011.
[28]
Tongping Liu, Chen Tian, Ziang Hu, and Emery D. Berger. PREDATOR: Predictive False Sharing Detection. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 3–14, 2014.
[29]
LMAX. LMAX Disruptor — Open Source — LMAX Exchange. https://www.lmax.com/disruptor, 2015.
[30]
Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. Efficient Deterministic Multithreading Without Global Barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 287–300, 2014.
[31]
Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Gilles Pokam, Chris Newburn, and Joseph Devietti. LASER: Light, Accurate Sharing dEtection and Repair. In Proceedings of the 2016 IEEE 22nd International Symposium on High Performance Computer Architecture, HPCA ’16, 2016.
[32]
Linux Programmer’s Manual. perf event open(2) Linux Programmer’s Manual, 2015.
[33]
mcmcc. false sharing in boost::detail::spinlock pool? http://stackoverflow.com/questions/11037655/ false-sharing-in-boostdetailspinlock-pool, June 2012.
[34]
Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield. Whose Cache Line is It Anyway?: Operating System Support for Live Detection and Repair of False Sharing. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pages 141–154, 2013.
[35]
Scott Oaks. Java Performance: The Definitive Guide. O’Reilly Media, 3rd edition, April 2014. Page 266.
[36]
Oracle. Java 7 SE API documentation: java.util.Random. http: //docs.oracle.com/javase/7/docs/api/java/util/Random.html, 2014.
[37]
Reactor Project. Spring Reactor. http://projectreactor.io/, 2015.
[38]
Mikael Ronstrom. MySQL team increases scalability by > 50% for Sysbench OLTP RO in MySQL 5.6 labs release april 2012. http://mikaelronstrom.blogspot.com/2012/ 04/mysql-team-increases-scalability-by-50.html, April 2012.
[39]
Martin Schindewolf. Analysis of Cache Misses Using SIMICS. Master’s thesis, Institute for Computing Systems Architecture, University of Edinburgh, 2007.
[40]
Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. Da Capo con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In Proceedings of the 26th Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA ’11, pages 657–676, 2011.
[41]
Yefim Shuf, Manish Gupta, Hubertus Franke, Andrew Appel, and Jaswinder Pal Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’02, pages 13–25, 2002.
[42]
Spring.io. Spring.io website. https://spring.io/, 2015.
[43]
Suriya Subramanian, Michael Hicks, and Kathryn S. McKinley. Dynamic Software Updates: A VM-centric Approach. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 1–12, 2009.
[44]
Peter F. Sweeney, Matthias Hauswirth, Brendon Cahoon, Perry Cheng, Amer Diwan, David Grove, and Michael Hind. Using Hardware Performance Monitors to Understand the Behavior of Java Applications. In Proceedings of the 3rd Conference on Virtual Machine Research And Technology Symposium - Volume 3, VM’04, pages 5–5, 2004.
[45]
The GPars team. The GPars Project - Reference Documentation. http://www.gpars.org/guide/, 2014.
[46]
Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, and Andrew Stewart. Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads. http://disruptor.googlecode.com/files/Disruptor-1.0. pdf, 5 2011.
[47]
Christian Wimmer and Hanspeter Mössenböck. Automatic Feedback-directed Object Inlining in the Java Hotspot Virtual Machine. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE ’07, pages 12–21, 2007.
[48]
Christian Wimmer and Hanspeter Mössenböck. Automatic Array Inlining in Java Virtual Machines. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’08, pages 14–23, 2008.
[49]
Christian Wimmer and Hanspeter Mössenbösck. Automatic Feedback-directed Object Fusing. ACM Trans. Archit. Code Optim., 7(2):7:1–7:35, October 2010.
[50]
LLC. WorldWide Conferencing. Lift Framework - LiftActor. http://liftweb.net/, 2014.
[51]
Derek Wyatt. Akka Concurrency - Building reliable software in a multicore world. Technical report, Artima Incorporation, 2013.
[52]
YourKit. YourKit Java Profiler - .NET Profiler. https://www. yourkit.com/, 2015.
[53]
Qin Zhao, David Koh, Syed Raza, Derek Bruening, Weng-Fai Wong, and Saman Amarasinghe. Dynamic Cache Contention Detection in Multi-threaded Applications. In Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’11, pages 27–38, 2011.

Cited By

View all
  • (2023)DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for JavaProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580010(81-94)Online publication date: 17-Feb-2023
  • (2023)Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative ComparisonIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325710534:5(1594-1608)Online publication date: May-2023
  • (2022)OJXPerfProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510083(1558-1570)Online publication date: 21-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2016
726 pages
ISBN:9781450342612
DOI:10.1145/2908080
  • General Chair:
  • Chandra Krintz,
  • Program Chair:
  • Emery Berger
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 6
    PLDI '16
    June 2016
    726 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2980983
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java
  2. cache coherence
  3. false sharing

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

PLDI '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)208
  • Downloads (Last 6 weeks)48
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for JavaProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580010(81-94)Online publication date: 17-Feb-2023
  • (2023)Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative ComparisonIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325710534:5(1594-1608)Online publication date: May-2023
  • (2022)OJXPerfProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510083(1558-1570)Online publication date: 21-May-2022
  • (2022)PInTE: Probabilistic Induction of Theft Evictions2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00011(1-13)Online publication date: Nov-2022
  • (2022)Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969376(1-8)Online publication date: 24-Oct-2022
  • (2020)HangFixProceedings of the 11th ACM Symposium on Cloud Computing10.1145/3419111.3421288(344-357)Online publication date: 12-Oct-2020
  • (2020)Efficient nursery sizing for managed languages on multi-core processors with shared cachesProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377908(1-15)Online publication date: 22-Feb-2020
  • (2019)A zero-positive learning approach for diagnosing software performance regressionsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455330(11627-11639)Online publication date: 8-Dec-2019
  • (2019)Evaluating the effectiveness of program data features for guiding memory managementProceedings of the International Symposium on Memory Systems10.1145/3357526.3357537(383-395)Online publication date: 30-Sep-2019
  • (2019)Pinpointing performance inefficiencies in JavaProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338923(818-829)Online publication date: 12-Aug-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media