skip to main content
10.1145/2908080.2908084acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Living on the edge: rapid-toggling probes with cross-modification on x86

Published:02 June 2016Publication History

ABSTRACT

Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the program---typically in which all the program threads are halted---to ensure that another thread executing the modified code region doesn't step into a partially-modified code. Stop-the-world patching is not scalable. In contrast, low overhead, scalable probes that can be rapidly toggled on and off in-place would open up new use cases for statistical profilers and language implementations, even traditional ahead-of-time, native-code compilers. In this paper we introduce safe cross-modification protocols that mutate x86 code between threads but do not require quiescing threads, resulting in radically lower overheads than existing solutions. A key problem is handling instructions that straddle cache lines. We empirically evaluate existing x86 architectures to derive a safe policy given current processor behavior, and we argue that future architectures should clarify the semantics of instruction fetching to make cheap cross-modification easier and future proof.

References

  1. kpatch: dynamic kernel patching. Technical report.Google ScholarGoogle Scholar
  2. S. V. Adve and M. D. Hill. Weak Ordering—A New Definition. In Proceedings of the Seventeenth International Symposium on Computer Architecture, pages 2–14, Seattle, WA, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 168–179, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools, pages 9–16. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H.-J. Boehm and S. V. Adve. Foundations of the C++ Concurrency Memory Model. In Proceedings of the SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on, pages 265–275. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Daloze, C. Seaton, D. Bonetta, and H. Mössenböck. Techniques and applications for guest-language safepoints. In Proceedings of the 10th Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems Workshop (ICOOOLPS), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Desnoyers and M. R. Dagenais. The lttng tracer: A low impact performance and behavior monitor for gnu/linux. In OLS (Ottawa Linux Symposium), volume 2006, pages 209–224. Citeseer, 2006.Google ScholarGoogle Scholar
  9. M. Desnoyers, P. McKenney, A. Stern, M. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. Parallel and Distributed Systems, IEEE Transactions on, 23(2):375–382, February 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Gharachorloo, S. V. Adve, A. Gupta, J. L. Hennessy, and M. D. Hill. Programming for Different Memory Consistency Models. Journal of Parallel and Distributed Computing, 15:399–407, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Gregg and J. Mauro. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Professional, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 2009 International Symposium on Memory Management, ISMM ’09, pages 20–29, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), pages 117–126, 2001.Google ScholarGoogle Scholar
  14. J. K. Hollingsworth and B. P. Miller. An adaptive cost system for parallel program instrumentation. In Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I, Euro-Par ’96, pages 88–97, London, UK, UK, 1996. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Intel. and ia-32 architectures software developer’s manual. Volume 3A: System Programming Guide, Part, 1, 64.Google ScholarGoogle Scholar
  16. A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. Cmp $ im: A pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28–36, 2008.Google ScholarGoogle Scholar
  17. J. Keniston, A. Mavinakayanahalli, P. Panchamukhi, and V. Prasad. Ptrace, utrace, uprobes: Lightweight, dynamic tracing of user apps. In Linux Symposium, page 215, 2007.Google ScholarGoogle Scholar
  18. A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel. The vampir performance analysis tool-set. In Tools for High Performance Computing, pages 139–155. Springer, 2008.Google ScholarGoogle Scholar
  19. L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):241–248, September 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Lindholm, F. Yellin, G. Bracha, and A. Buckley. The Java virtual machine specification. Pearson Education, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In Conference Record of the Thirty-Second ACM Symposium on Principles of Programming Languages, Long Beach, CA, January 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. V. Mirgorodskiy and B. P. Miller. Diagnosing distributed systems with self-propelled instrumentation. In Middleware 2008, pages 82– 103. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Prasad, W. Cohen, F. Eigler, M. Hunt, J. Keniston, and J. Chen. Locating system problems using dynamic instrumentation. In 2005 Ottawa Linux Symposium, pages 49–64. Citeseer, 2005.Google ScholarGoogle Scholar
  24. G. Ravipati, A. R. Bernat, N. Rosenblum, B. P. Miller, and J. K. Hollingsworth. Toward the deconstruction of dyninst. Technical report, Technical Report, Computer Sciences Department, University of Wisconsin, Madison (ftp://ftp. cs. wisc. edu/paradyn/papers/Ravipati07Symta bAPI. pdf), 2007.Google ScholarGoogle Scholar
  25. G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Googlewide profiling: A continuous profiling infrastructure for data centers. IEEE micro, (4):65–79, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. B. Schardl, B. C. Kuszmaul, I. Lee, W. M. Leiserson, C. E. Leiserson, et al. The cilkprof scalability profiler. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, pages 89–100. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. X86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors. Communications of the ACM, 53(7):89–97, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Wallace and N. Bagherzadeh. Modeled and measured instruction fetching performance for superscalar microprocessors. Parallel and Distributed Systems, IEEE Transactions on, 9(6):570–578, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Living on the edge: rapid-toggling probes with cross-modification on x86

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2016
            726 pages
            ISBN:9781450342612
            DOI:10.1145/2908080
            • General Chair:
            • Chandra Krintz,
            • Program Chair:
            • Emery Berger
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 51, Issue 6
              PLDI '16
              June 2016
              726 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2980983
              • Editor:
              • Andy Gill
              Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 June 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate406of2,067submissions,20%

            Upcoming Conference

            PLDI '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader