Abstract
Dynamic instrumentation systems offer a valuable solution for program profiling and analysis, architectural simulation, and bug detection. However, the performance of target programs suffers great losses when they are instrumented by these systems. This issue is mainly caused by the resource contention between the target programs and the instrumentation systems. As multi-core processors are becoming more and more prevalent in modern computing environments, more hardware resource parallelism is provided for software to exploit. In this paper, we propose to leverage the abundant computing resources of multi-core systems to accelerate program instrumentation. We design and implement a Multi-Threaded Profiling framework named MT-Profiler to dynamically characterize parallel programs. The framework creates instrumenting and analysis code slices which can run along with application threads in parallel on different cores. To further reduce the overhead of dynamic instrumentation, MT-profiler employs two-stage sampling scheme with several optimizations. It is implemented on DynamoRIO which is an open-source dynamic instrumentation and runtime code manipulation system. The performance of our MT-Profiler is evaluated by using the NPB3.3 OPENMP test suite. The results demonstrate that the MT-Profiler obtains 3 to 17 times speedup compared with the DynamoRIO perfect profiler while the average accuracy is over 80%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arnold, M., Ryder, B.: A framework for reducing the cost of instrumented code. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 168–179. ACM, New York (2001)
Arnold, M., Sweeney, P.: Approximating the calling context tree via sampling. Tech. rep., IBM T. J. Wastson Research Center (2000)
Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: A transparent dynamic optimization system. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1–12. ACM, New York (2000)
Bond, M., McKinley, S.: Continuous Path and Edge Profiling. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2005), pp. 130–140. ACM, New York (2005)
Bruening, D.: Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.D. thesis, MIT, Cambridge, Mass, USA (2004)
Cantrill, B., Shapiro, M., Leventhal, A.: Dynamic instrumentation of production systems. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp. 15–28. USENIX (2004)
Chen, H., Hsu, W., Lu, J., Yew, P., Chen, D.: Dynamic trace selection using performance monitoring hardware sampling. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 79–90. IEEE, Los Alamitos (2003)
Dean, J., Hicks, J., Waldspurger, C., Weihl, W., Chrysos, G.: Proleme: Hardware support for instructionlevel proling on out-of-order processors. In: Proceedings of the International Symposium on Microarchitecture, pp. 292–302. IEEE, Los Alamitos (1997)
Ha, J., Arnold, M., Blackburn, S., McKinley, K.: A concurrent dynamic analysis framework for multicore hardware. In: Proceeding of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, pp. 155–174. ACM, New York (2009)
Hirzel, M., Chilimbi, T.: Bursty tracing: A framework for low-overhead temporal proling. In: Proceedings of the 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, pp. 117–126. ACM, New York (2001)
Jennifer, A., Lance, B., Jeffrey, D., Sanjay, G., Monika, H., Shun-Tak, L., Richard, S., Mark, V., Carl, W., William, W.: Continuous profiling: Where have all the cycles gone?. ACM Transactions on Computer Systems (TOCS) 15(4), 357–390 (1997)
Larus, J.: Whole program paths. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 259–269. ACM, New York (1999)
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 191–200. ACM, New York (2005)
Moseley, T., Shye, A., Reddi, V., Grunwald, D., Peri, R.: Shadow proling: Hiding instrumentation costs with parallelism. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 198–208. IEEE, Los Alamitos (2007)
Nethercote, N., Seward, J.: Valgrind: A program supervision framework. Electronic Notes in Theoretical Computer Science 89(2), 1–23 (2003)
Patil, H., Fischer, C.: Efficient run-time monitoring using shadow processing. In: Proceedings of the Workshop on Automated and Algorithmic Debugging, pp. 119–132. Ghent University (1995)
Shetty, R., Kharbutli, M., Solihin, Y., Prvulovic, M.: HeapMon: a helper-thread approach to programmable, automatic, and low overhead memory bug detection. IBM Journal of Research and Development 50(2/3), 261–275 (2006)
Srivastava, A., Eustace, A.: ATOM: A system for building customized program analysis tools. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 196–205. ACM, New York (1994)
Wallace, S., Hazelwood, K.: Superpin: Parallelizing dynamic instrumentation for real-time performance. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 209–220. IEEE, Los Alamitos (2007)
Wang, C., Kim, H., Wu, Y., Ying, V.: Compiler-managed software-based redundant multi-threading for transient fault detection. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 244–258. IEEE, Los Alamitos (2007)
Zhao, Q., Cutcutache, I., Wong, W.: PiPA: Pipelined Proling and Analysis. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 185–194. IEEE, Los Alamitos (2008)
Zhou, P., Qin, F., Liu, W., Zhou, Y., Torrellas, J.: iWatcher: Efficient Architectural Support for Software Debugging. In: Proceedings of the ACM/IEEE International Symposium on Computer Architecture, pp. 224–235. ACM, New York (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, Z., Zhang, W., Tu, X. (2011). MT-Profiler: A Parallel Dynamic Analysis Framework Based on Two-Stage Sampling. In: Temam, O., Yew, PC., Zang, B. (eds) Advanced Parallel Processing Technologies. APPT 2011. Lecture Notes in Computer Science, vol 6965. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24151-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24151-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24150-5
Online ISBN: 978-3-642-24151-2
eBook Packages: Computer ScienceComputer Science (R0)