skip to main content
10.1145/2465351.2465366acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Whose cache line is it anyway?: operating system support for live detection and repair of false sharing

Published:15 April 2013Publication History

ABSTRACT

As hardware parallelism continues to increase, CPU caches can no longer be considered as a transparent, hardware-level performance optimization. Cache impact on performance, in particular in the face of false sharing, is completely dependent on the software that is executing. To effectively support parallel workloads on cache coherent hardware, the operating system must begin to treat the CPU cache like other shared hardware resources, and manage it appropriately.

We demonstrate a prototype example of such support by describing Plastic, a software-based system that detects, diagnoses, and transparently repairs false sharing as it occurs in running applications. Plastic solves two challenging problems. First, it is capable of rapid, low-overhead detection and diagnosis of false sharing in unmodified, running applications. Second, it resolves identified instances of false sharing by providing a sub-page granularity memory remapping facility within the system. Our implementation is capable of identifying and repairing pathological false sharing in under one second of execution and achieves speedups of 3-6x on known examples of false sharing in parallel benchmarks.

References

  1. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bergan, N. Hunt, L. Ceze, and S. Gribble. Deterministic process groups in dos. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In PASTE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia and K. Li. Parsec 2.0: A new benchmark suite for chip-multiprocessors. In Workshop on Modeling, Benchmarking and Simulation, 2009.Google ScholarGoogle Scholar
  6. W. J. Bolosky and M. L. Scott. False sharing and its effect on shared memory performance. In SEDMS, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In CGO, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Burrows, U. Erlingsson, S.-T. A. Leung, M. T. Vandevoorde, C. A. Waldspurger, K. Walker, and W. E. Weihl. Efficient and flexible value sampling. In ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Dawes, D. Abrahams, and R. Rivera. Boost C++ libraries. http://www.boost.org, 2009.Google ScholarGoogle Scholar
  11. D. Dice. False sharing induced by card table marking, February 2011. URL https://blogs.oracle.com/dave/entry/false_sharing_induced_by_card.Google ScholarGoogle Scholar
  12. U. Erlingsson, M. Abadi, M. Vrable, M. Budiu, and G. C. Necula. XFI: software guards for system address spaces. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Ford and R. Cox. Vx32: lightweight user-level sandboxing on the x86. In USENIX ATC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. L. Greathouse, Z. Ma, M. I. Frank, R. Peri, and T. Austin. Demand-driven software race detection using hardware performance counters. In ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. L. Greathouse, H. Xin, Y. Luo, and T. Austin. A case for unlimited watchpoints. In ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. M. Gunther and J. Weidendorfer. Assessing cache false sharing effects by dynamic binary instrumentation. In WBIA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. 5 edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy and J. Moss. System for achieving atomic non-sequential multi-word operations in shared memory, June 27 1995. US Patent 5,428,761.Google ScholarGoogle Scholar
  19. J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. IEEE, 2010.Google ScholarGoogle Scholar
  20. Intel. Avoiding and identifying false sharing among threads, November 2011. URL http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/.Google ScholarGoogle Scholar
  21. Intel. Intel performance tuning utility, October 2012. URL http://software.intel.com/en-us/articles/intel-performance-tuning-utility/.Google ScholarGoogle Scholar
  22. A. Jaleel, R. S. Cohn, C. keung Luk, and B. Jacob. CMPSim: A pin-based on-the-fly multi-core cache simulator. In MOBS, 2008.Google ScholarGoogle Scholar
  23. D. Levinthal. Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors, 2008.Google ScholarGoogle Scholar
  24. T. Liu and E. D. Berger. Sheriff: precise detection and automatic mitigation of false sharing. In OOPSLA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Martin, M. Hill, and D. Sorin. Why on-chip cache coherence is here to stay. CACM, 55(7):78--89, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. mcmcc. false sharing in boost::detail::spinlock_pool?, June 2012. URL http://stackoverflow.com/questions/11037655/false-sharing-in-boostdetailspinlock-pool.Google ScholarGoogle Scholar
  28. D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In PACT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Moore, J. Bobba, M. Moravan, M. Hill, and D. Wood. Logtm: Log-based transactional memory. In HPCA, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. J. Moore. A universal dynamic trace for linux and other operating systems. In USENIX ATC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Olszewski, K. Mierle, A. Czajkowski, and A. D. Brown. JIT instrumentation: a novel approach to dynamically instrument operating systems. In EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Olszewski, Q. Zhao, D. Koh, J. Ansel, and S. P. Amarasinghe. Aikido: Accelerating shared data dynamic analyses. In ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. S. Papamarcos and J. H. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. In ISCA, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Pesterev, N. Zeldovich, and R. T. Morris. Locating cache performance bottlenecks using data profiling. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Tamches and B. P. Miller. Fine-grained dynamic instrumentation of commodity operating system kernels. In OSDI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Tao and W. Karl. CacheIn: A toolset for comprehensive cache inspection. In International Conference on Computational Science, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Thacker. Beehive: A many-core computer for FPGAs (v5). MSR Silicon Valley, Jan 2010. URL http://projects.csail.mit.edu/beehive/BeehiveV5.pdf.Google ScholarGoogle Scholar
  39. B. Yee, D. Sehr, G. Dardyk, J. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native client: A sandbox for portable, untrusted x86 native code. In IEEE S&P, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Q. Zhao, D. Koh, S. Raza, D. Bruening, W.-F. Wong, and S. Amarasinghe. Dynamic cache contention detection in multi-threaded applications. In VEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Whose cache line is it anyway?: operating system support for live detection and repair of false sharing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
          April 2013
          401 pages
          ISBN:9781450319942
          DOI:10.1145/2465351

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 April 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader