Abstract
Multi-core based systems are ubiquitous in data centers. Efficient exploitation of hardware parallelism supported by such systems is imperative on multiple fronts: minimizing latency and power consumption and maximizing throughput. This in turn calls for advanced program analysis and optimization. Call graphs have been long used to this end. Although several static call graph extraction techniques have been proposed in the past, these techniques cannot be applied to analyze programs already running in production. Likewise, the existing dynamic call graph extraction tools have limited use in production owing to, say (but not limited to), lack of support for capturing wall clock time spent in functions of a given program and lack of means to analyze the call graph information captured at run time. In this paper, we present a Pin-based dynamic call graph extraction framework called Trin-Trin. The framework enables extraction of complete, precise and dynamic call graphs. Additionally, the framework can be used seamlessly with already running applications. Furthermore, an analytics engine is provided to facilitate advanced program analysis, e.g., different multithreading context(s) of any function can be extracted in a demand-driven fashion. We evaluate the overhead of Trin-Trin using several Unix utilities, applications from the industry-standard SPEC CINT2006, CFP2006 benchmark suite and Yahoo! properties. Additionally, we present a case study to illustrate how Trin-Trin can be used to analyze performance bottlenecks and performance regressions.
Similar content being viewed by others
References
Allen F.E.: Program optimization. Annu. Rev. Autom. Program. 5, 239–307 (1969)
Allen, F.E.: Interprocedural data flow analysis. In: IFIP Congress, pp. 398–402 (1974)
Bach M., Charney M., Cohn R., Demikhovsky E., Devor T., Hazelwood K., Jaleel A., Luk C.K., Lyons G., Patil H., Tal A.: Analyzing parallel programs with Pin. Computer 43, 34–41 (2010)
Bacon, D.F., Sweeney, P.F.: Fast static analysis of c++ virtual function calls. In: Proceedings of the 11th ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, San Jose, CA, pp. 324–341 (1996)
Banning, J.P.: An efficient way to find the side effects of procedure calls and the aliases of variables. In: Conference Record of the Sixth Annual ACM Symposium on the Principles of Programming Languages, New York, NY, pp. 29–41 (1979)
Bruening, D.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D. thesis, Massachusetts Institute of Technology (2004)
Callahan D., Carle A., Hall M.W., Kennedy K.: Constructing the procedure call multigraph. IEEE Trans. Soft. Eng. 16(4), 483–487 (1990)
Callgrind: a call-graph generating cache profiler. http://valgrind.org/docs/manual/cl-manual.html
Chen Y.F., Nishimoto M.Y., Ramamoorthy C.V.: The C information abstraction system. IEEE Trans. Softw. Eng. 16(3), 325–334 (1990)
Chikofsky E.J., Cross J.H. II: Reverse engineering and design recovery: a taxonomy. IEEE Softw. 7(1), 13–17 (1990)
Choi S.C., Scacchi W.: Extracting and restructuring the design of large systems. IEEE Softw. 7(1), 66–71 (1990)
Cscope: a developer’s tool for browsing source code. http://cscope.sourceforge.net/
Demme, J., Sethumadhavan, S.: Rapid identification of architectural bottlenecks via precise event counting. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, pp. 353–364 (2011)
Doxygen. http://doxygen.org/
Eustace, A., Srivastava, A.: ATOM: a flexible interface for building high performance program analysis tools. In: Proceedings of the USENIX 1995 Technical Conference, New Orleans, LA, pp. 25–25 (1995)
FreeBSD 8.1 ports distribution. ftp://ftp.freebsd.org/pub/FreeBSD/releases/i386/8.1-RELEASE/ports/. MD5:73589e78c9e246f737e43b8c57c8f875
Gerber R., Bik A.J., Smith K.B., Tian X.: The Software Optimization Cookbook. Intel Press, Hillsboro, OR (2006)
Graham, S.L., Kessler, P.B., Mckusick, M.K.: Gprof: a call graph execution profiler. In: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, Boston, MA, pp. 120–126 (1982)
Graphviz. http://www.graphviz.org/
Griswold, W.G., Atkinson, D.C., McCurdy, C.: Fast, flexible syntactic pattern matching and processing. In: Proceedings of the 4th International Workshop on Program Comprehension, p. 144 (1996)
Grove D., Chambers C.: A framework for call graph construction algorithms. ACM Trans. Program. Lang. Syst. 23(6), 685–746 (2001)
Grun, P., Dutt, N., Nicolau, A.: Memory aware compilation through accurate timing extraction. In: Proceedings of the 37th Annual Design Automation Conference, Los Angeles, CA, USA, pp. 316–321 (2000)
Hall M.W., Kennedy K.: Efficient call graph analysis. ACM Lett. Programm. Lang. Syst. 1(3), 227–242 (1992)
Intel® Performance tuning utility 4.0 update 5. http://software.intel.com/en-us/articles/intel-performance-tuning-utility/
Intel® VTune. http://software.intel.com/en-us/intel-vtune/
KCachegrind, http://kcachegrind.sourceforge.net/html/Home.html
Lakhotia, A.: Constructing call multigraphs using dependence graphs. In: Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, Charleston, SC, pp. 273–284 (1993)
Lhoták, O.: Comparing call graphs. In: Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, San Diego, CA, pp. 37–42 (2007)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the SIGPLAN ’05 Conference on Programming Language Design and Implementation, Chicago, IL, USA, pp. 190–200 (2005)
McKeeman W.M.: Peephole optimization. Commun. ACM 8(7), 443–444 (1965)
Milanova A., Rountev A., Ryder B.G.: Precise call graphs for C programs with function pointers. Autom. Softw. Eng. 11(1), 7–26 (2004)
Moseley, T., Connors, D.A., Grunwald, D., Peri, R.: Identifying potential parallelism via loop-centric profiling. In: Proceedings of the 4th International Conference on Computing Frontiers, Ischia, Italy, pp. 143–152 (2007)
Müller, H.A., Klashinsky, K.: Rigi-a system for programming-in-the-large. In: Proceedings of the 10th International Conference on Software Engineering, Singapore, pp. 80–86 (1988)
Murphy G.C., Notkin D., Griswold W.G., Lan E.S.C.: An empirical study of static call graph extractors. ACM Trans. Softw. Eng. Methodol. 7(2), 158–191 (1998)
MySQL: the world’s most popular open source database. http://www.MySQL.org/
Neyman J., Pearson E.S.: On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 20, 175–240 (1928)
Nicolau, A., Li, G., Kejariwal, A.: Techniques for efficient placement of synchronization primitives. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, pp. 199–208 (2009)
Ocamlgraph: a graph library for Objective Caml. http://ocamlgraph.lri.fr/
OProfile—a system profiler for linux. http://oprofile.sourceforge.net/news/
Optimizing InnoDB disk i/o. http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-diskio.html
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/bzip2/CBZip2InputStream.html
org.apache.hadoop.io.compress.bzip2.CBZip2OutputStream. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/bzip2/CBZip2OutputStream.html
Panda, P.R., Dutt, N.D., Nicolau, A.: Memory organization for improved data cache performance in embedded processors. In: Proceedings of the 9th International Symposium on System Synthesis, pp. 90–95 (1996)
Patil, H., Pereira, C., Stallcup, M., Lueck, G., Cownie, J.: PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Toronto, ON, Canada, pp. 2–11 (2010)
Pearson K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arised from random sampling. Philosoph. Mag. Ser. 5(50), 157–175 (1900)
Reiss S.P.: The Field Programming Environment: A Friendly Integrated Environment for Learning and Development. Kluwer, Norwell, MA (1995)
Ryder B.G.: Constructing the call graph of a program. IEEE Trans. Softw. Eng. 5, 216–226 (1979)
Sereni, D.: Termination analysis and call graph construction for higher-order functional programs. In: Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, Freiburg, Germany, pp. 71–84 (2007)
Shivers, O.G.: Control-flow analysis of higher-order languages of taming lambda. Ph.D. thesis, Carnegie Mellon University (1991)
Spearman C.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904)
Spearman C.: Footrule for measuring correlation. Br. J. Psychol. 2(1), 89–108 (1906)
Spinellis D.: Cscout: a refactoring browser for C. Sci. Comput. Program. 75(4), 216–231 (2010)
Stube, A.O., Rexachs, D., Luque, E.: Software probes: towards a quick method for machine characterization and application performance prediction. In: Proceedings of the 2008 International Symposium on Parallel and Distributed Computing, pp. 23–30 (2008)
SPEC CFP2006. http://www.spec.org/cpu2006/CFP2006/
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006/
SPEC OMP Benchmarks. http://www.spec.org/omp/
Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, pp. 229–240 (2009)
The Caml Language. http://caml.inria.fr/
Tip, F., Palsberg, J.: Scalable propagation-based call graph construction algorithms. In: Proceedings of the 15th ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, Minneapolis, MN, pp. 281–293 (2000)
Valgrind. http://valgrind.org/
Wang P.H., Collins J.D., Wang H., Kim D., Greene B., Chan K.M., Yunus A.B., Sych T., Moore S.F., Shen J.P.: Helper threads via virtual multithreading on an experimental Itanium®2 processor-based platform. SIGPLAN Notices 39(11), 144–155 (2004)
Zhang W., Ryder B.G.: Automatic construction of accurate application call graph with library call abstraction for java: research articles. J. Soft. Maint. Evol. Res. Pract. 19(4), 231–252 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jalan, R., Kejariwal, A. Trin-Trin: Who’s Calling? A Pin-Based Dynamic Call Graph Extraction Framework. Int J Parallel Prog 40, 410–442 (2012). https://doi.org/10.1007/s10766-012-0193-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0193-x