ABSTRACT
Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the program---typically in which all the program threads are halted---to ensure that another thread executing the modified code region doesn't step into a partially-modified code. Stop-the-world patching is not scalable. In contrast, low overhead, scalable probes that can be rapidly toggled on and off in-place would open up new use cases for statistical profilers and language implementations, even traditional ahead-of-time, native-code compilers. In this paper we introduce safe cross-modification protocols that mutate x86 code between threads but do not require quiescing threads, resulting in radically lower overheads than existing solutions. A key problem is handling instructions that straddle cache lines. We empirically evaluate existing x86 architectures to derive a safe policy given current processor behavior, and we argue that future architectures should clarify the semantics of instruction fetching to make cheap cross-modification easier and future proof.
- kpatch: dynamic kernel patching. Technical report.Google Scholar
- S. V. Adve and M. D. Hill. Weak Ordering—A New Definition. In Proceedings of the Seventeenth International Symposium on Computer Architecture, pages 2–14, Seattle, WA, May 1990. Google ScholarDigital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 168–179, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools, pages 9–16. ACM, 2011. Google ScholarDigital Library
- H.-J. Boehm and S. V. Adve. Foundations of the C++ Concurrency Memory Model. In Proceedings of the SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, June 2008. Google ScholarDigital Library
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on, pages 265–275. IEEE, 2003. Google ScholarDigital Library
- B. Daloze, C. Seaton, D. Bonetta, and H. Mössenböck. Techniques and applications for guest-language safepoints. In Proceedings of the 10th Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems Workshop (ICOOOLPS), 2015. Google ScholarDigital Library
- M. Desnoyers and M. R. Dagenais. The lttng tracer: A low impact performance and behavior monitor for gnu/linux. In OLS (Ottawa Linux Symposium), volume 2006, pages 209–224. Citeseer, 2006.Google Scholar
- M. Desnoyers, P. McKenney, A. Stern, M. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. Parallel and Distributed Systems, IEEE Transactions on, 23(2):375–382, February 2012. Google ScholarDigital Library
- K. Gharachorloo, S. V. Adve, A. Gupta, J. L. Hennessy, and M. D. Hill. Programming for Different Memory Consistency Models. Journal of Parallel and Distributed Computing, 15:399–407, 1992.Google ScholarCross Ref
- B. Gregg and J. Mauro. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Professional, 2011. Google ScholarDigital Library
- K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 2009 International Symposium on Memory Management, ISMM ’09, pages 20–29, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), pages 117–126, 2001.Google Scholar
- J. K. Hollingsworth and B. P. Miller. An adaptive cost system for parallel program instrumentation. In Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I, Euro-Par ’96, pages 88–97, London, UK, UK, 1996. Springer-Verlag. Google ScholarDigital Library
- I. Intel. and ia-32 architectures software developer’s manual. Volume 3A: System Programming Guide, Part, 1, 64.Google Scholar
- A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. Cmp $ im: A pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28–36, 2008.Google Scholar
- J. Keniston, A. Mavinakayanahalli, P. Panchamukhi, and V. Prasad. Ptrace, utrace, uprobes: Lightweight, dynamic tracing of user apps. In Linux Symposium, page 215, 2007.Google Scholar
- A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel. The vampir performance analysis tool-set. In Tools for High Performance Computing, pages 139–155. Springer, 2008.Google Scholar
- L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):241–248, September 1979. Google ScholarDigital Library
- T. Lindholm, F. Yellin, G. Bracha, and A. Buckley. The Java virtual machine specification. Pearson Education, 2014. Google ScholarDigital Library
- J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In Conference Record of the Thirty-Second ACM Symposium on Principles of Programming Languages, Long Beach, CA, January 2005. Google ScholarDigital Library
- A. V. Mirgorodskiy and B. P. Miller. Diagnosing distributed systems with self-propelled instrumentation. In Middleware 2008, pages 82– 103. Springer, 2008. Google ScholarDigital Library
- V. Prasad, W. Cohen, F. Eigler, M. Hunt, J. Keniston, and J. Chen. Locating system problems using dynamic instrumentation. In 2005 Ottawa Linux Symposium, pages 49–64. Citeseer, 2005.Google Scholar
- G. Ravipati, A. R. Bernat, N. Rosenblum, B. P. Miller, and J. K. Hollingsworth. Toward the deconstruction of dyninst. Technical report, Technical Report, Computer Sciences Department, University of Wisconsin, Madison (ftp://ftp. cs. wisc. edu/paradyn/papers/Ravipati07Symta bAPI. pdf), 2007.Google Scholar
- G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Googlewide profiling: A continuous profiling infrastructure for data centers. IEEE micro, (4):65–79, 2010. Google ScholarDigital Library
- T. B. Schardl, B. C. Kuszmaul, I. Lee, W. M. Leiserson, C. E. Leiserson, et al. The cilkprof scalability profiler. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, pages 89–100. ACM, 2015. Google ScholarDigital Library
- P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. X86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors. Communications of the ACM, 53(7):89–97, July 2010. Google ScholarDigital Library
- S. Wallace and N. Bagherzadeh. Modeled and measured instruction fetching performance for superscalar microprocessors. Parallel and Distributed Systems, IEEE Transactions on, 9(6):570–578, June 1998. Google ScholarDigital Library
Index Terms
- Living on the edge: rapid-toggling probes with cross-modification on x86
Recommendations
Instruction punning: lightweight instrumentation for x86-64
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationExisting techniques for injecting probes into running applications are limited;
they either fail to support probing arbitrary locations, or to support scalable,
rapid toggling of probes. We introduce a new technique on x86-64, called
instruction ...
Instruction punning: lightweight instrumentation for x86-64
PLDI '17Existing techniques for injecting probes into running applications are limited;
they either fail to support probing arbitrary locations, or to support scalable,
rapid toggling of probes. We introduce a new technique on x86-64, called
instruction ...
Living on the edge: rapid-toggling probes with cross-modification on x86
PLDI '16Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the ...
Comments