research-article

Living on the edge: rapid-toggling probes with cross-modification on x86

Authors:
Buddhika Chamith

Indiana University, USA

Indiana University, USA
View Profile

,
Bo Joel Svensson

Indiana University, USA

Indiana University, USA
View Profile

,
Luke Dalessandro

Indiana University, USA

Indiana University, USA
View Profile

,
Ryan R. Newton

Indiana University, USA

Indiana University, USA
View Profile

PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2016Pages 16–26https://doi.org/10.1145/2908080.2908084

Published:02 June 2016Publication History

PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 16–26

ABSTRACT

Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the program---typically in which all the program threads are halted---to ensure that another thread executing the modified code region doesn't step into a partially-modified code. Stop-the-world patching is not scalable. In contrast, low overhead, scalable probes that can be rapidly toggled on and off in-place would open up new use cases for statistical profilers and language implementations, even traditional ahead-of-time, native-code compilers. In this paper we introduce safe cross-modification protocols that mutate x86 code between threads but do not require quiescing threads, resulting in radically lower overheads than existing solutions. A key problem is handling instructions that straddle cache lines. We empirically evaluate existing x86 architectures to derive a safe policy given current processor behavior, and we argue that future architectures should clarify the semantics of instruction fetching to make cheap cross-modification easier and future proof.

References

kpatch: dynamic kernel patching. Technical report.Google Scholar
S. V. Adve and M. D. Hill. Weak Ordering—A New Definition. In Proceedings of the Seventeenth International Symposium on Computer Architecture, pages 2–14, Seattle, WA, May 1990. Google ScholarDigital Library
M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 168–179, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools, pages 9–16. ACM, 2011. Google ScholarDigital Library
H.-J. Boehm and S. V. Adve. Foundations of the C++ Concurrency Memory Model. In Proceedings of the SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, June 2008. Google ScholarDigital Library
D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on, pages 265–275. IEEE, 2003. Google ScholarDigital Library
B. Daloze, C. Seaton, D. Bonetta, and H. Mössenböck. Techniques and applications for guest-language safepoints. In Proceedings of the 10th Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems Workshop (ICOOOLPS), 2015. Google ScholarDigital Library
M. Desnoyers and M. R. Dagenais. The lttng tracer: A low impact performance and behavior monitor for gnu/linux. In OLS (Ottawa Linux Symposium), volume 2006, pages 209–224. Citeseer, 2006.Google Scholar
M. Desnoyers, P. McKenney, A. Stern, M. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. Parallel and Distributed Systems, IEEE Transactions on, 23(2):375–382, February 2012. Google ScholarDigital Library
K. Gharachorloo, S. V. Adve, A. Gupta, J. L. Hennessy, and M. D. Hill. Programming for Different Memory Consistency Models. Journal of Parallel and Distributed Computing, 15:399–407, 1992.Google ScholarCross Ref
B. Gregg and J. Mauro. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Professional, 2011. Google ScholarDigital Library
K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 2009 International Symposium on Memory Management, ISMM ’09, pages 20–29, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), pages 117–126, 2001.Google Scholar
J. K. Hollingsworth and B. P. Miller. An adaptive cost system for parallel program instrumentation. In Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I, Euro-Par ’96, pages 88–97, London, UK, UK, 1996. Springer-Verlag. Google ScholarDigital Library
I. Intel. and ia-32 architectures software developer’s manual. Volume 3A: System Programming Guide, Part, 1, 64.Google Scholar
A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. Cmp $ im: A pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28–36, 2008.Google Scholar
J. Keniston, A. Mavinakayanahalli, P. Panchamukhi, and V. Prasad. Ptrace, utrace, uprobes: Lightweight, dynamic tracing of user apps. In Linux Symposium, page 215, 2007.Google Scholar
A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel. The vampir performance analysis tool-set. In Tools for High Performance Computing, pages 139–155. Springer, 2008.Google Scholar
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):241–248, September 1979. Google ScholarDigital Library
T. Lindholm, F. Yellin, G. Bracha, and A. Buckley. The Java virtual machine specification. Pearson Education, 2014. Google ScholarDigital Library
J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In Conference Record of the Thirty-Second ACM Symposium on Principles of Programming Languages, Long Beach, CA, January 2005. Google ScholarDigital Library
A. V. Mirgorodskiy and B. P. Miller. Diagnosing distributed systems with self-propelled instrumentation. In Middleware 2008, pages 82– 103. Springer, 2008. Google ScholarDigital Library
V. Prasad, W. Cohen, F. Eigler, M. Hunt, J. Keniston, and J. Chen. Locating system problems using dynamic instrumentation. In 2005 Ottawa Linux Symposium, pages 49–64. Citeseer, 2005.Google Scholar
G. Ravipati, A. R. Bernat, N. Rosenblum, B. P. Miller, and J. K. Hollingsworth. Toward the deconstruction of dyninst. Technical report, Technical Report, Computer Sciences Department, University of Wisconsin, Madison (ftp://ftp. cs. wisc. edu/paradyn/papers/Ravipati07Symta bAPI. pdf), 2007.Google Scholar
G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Googlewide profiling: A continuous profiling infrastructure for data centers. IEEE micro, (4):65–79, 2010. Google ScholarDigital Library
T. B. Schardl, B. C. Kuszmaul, I. Lee, W. M. Leiserson, C. E. Leiserson, et al. The cilkprof scalability profiler. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, pages 89–100. ACM, 2015. Google ScholarDigital Library
P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. X86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors. Communications of the ACM, 53(7):89–97, July 2010. Google ScholarDigital Library
S. Wallace and N. Bagherzadeh. Modeled and measured instruction fetching performance for superscalar microprocessors. Parallel and Distributed Systems, IEEE Transactions on, 9(6):570–578, June 1998. Google ScholarDigital Library

Index Terms

Living on the edge: rapid-toggling probes with cross-modification on x86

Recommendations

Instruction punning: lightweight instrumentation for x86-64
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

Existing techniques for injecting probes into running applications are limited;

they either fail to support probing arbitrary locations, or to support scalable,

rapid toggling of probes. We introduce a new technique on x86-64, called

instruction ...
Read More
Instruction punning: lightweight instrumentation for x86-64
PLDI '17

Existing techniques for injecting probes into running applications are limited;

they either fail to support probing arbitrary locations, or to support scalable,

rapid toggling of probes. We introduce a new technique on x86-64, called

instruction ...
Read More
Living on the edge: rapid-toggling probes with cross-modification on x86
PLDI '16

Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2016
726 pages
ISBN:9781450342612
DOI:10.1145/2908080
General Chair:
Chandra Krintz
University of California at Santa Barbara, USA
,
Program Chair:
Emery Berger
University of Massachusetts at Amherst, USA
ACM SIGPLAN Notices Volume 51, Issue 6
PLDI '16
June 2016
726 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2980983
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
application profiling
dynamic instrumentation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 366
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Living on the edge: rapid-toggling probes with cross-modification on x86

PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Instruction punning: lightweight instrumentation for x86-64

Instruction punning: lightweight instrumentation for x86-64

Living on the edge: rapid-toggling probes with cross-modification on x86