Abstract
The gap between CPU peak performance and achieved application performance widens as CPU complexity, as well as the gap between CPU cycle time and DRAM access time, increases. While advanced compilers can perform many optimizations to better utilize the cache system, the application programmer is still required to do some of the optimizations needed for efficient execution. Therefore, profiling should be performed on optimized binary code and performance problems reported to the programmer in an intuitive way. Existing performance tools do not have adequate functionality to address these needs. Here we introduce source interdependence profiling, SIP, as a paradigm to collect and present performance data to the programmer. SIP identifies the performance problems that remain after the compiler optimization and gives intuitive hints at the source-code level as to how they can be avoided. Instead of just collecting information about the events directly caused by each source-code statement, SIP also presents data about events from some interdependent statements of source code. A first SIP prototype tool has been implemented. It supports both C and Fortran programs. We describe how the tool was used to improve the performance of the SPEC CPU2000 183.equake application by 59 percent.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S. Leung, D. Sites, M. Vandevoorde, C. Waldspurger, and W. Weihl. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems, 1997.
B. Buck and J. Hollingsworth. Using hardware performance monitors to isolate memory bottlenecks. In Proceedings of Supercomputing, 2000.
J. Dean, J. Hicks, C. Waldspurger, W. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the 30th Annual International Symposium on Microarchitecture, 1997.
L. DeRose and D. Reed. Svpablo: A multi-language architecture-independent performance analysis system. In 10th International Conference on Performance Tools, pages 352–355, 1999.
A. Goldberg and J. Hennessy. MTOOL: A method for isolating memory bottlenecks in shared memory multiprocessor programs. In Proceedings of the International Conference on Parallel Processing, pages 251–257, 1991.
S. Goldschmidt H. Davis and J. Hennessy. Tango: A multiprocessor simulation and tracing system. In Proceedings of the International Conference on Parallel Processing, 1991.
R. Fowler J. Mellor-Crummey and D. Whalley. Tools for application-oriented performance tuning. In Proceedings of the 2001 ACM International Conference on Supercomputing, 2001.
Alvin R. Lebeck and David A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, 27(10):15–26, 1994.
S. Devine M. Rosenblum, E. Bugnion and S. Herrod. Using the simos machine simulator to study complex systems. ACM Transactions on Modelling and Computer Simulation, 7:78–103, 1997.
P. Magnusson, F. Larsson, A. Moestedt, B. Werner, F. Dahlgren, M. Karlsson, F. Lundholm, J. Nilsson, P. Stenström, and H. Grahn. SimICS/sun4m: A virtual workstation. In Proceedings of the Usenix Annual Technical Conference, pages 119–130, 1998.
M. Martonosi, A. Gupta, and T. Anderson. Memspy: Analyzing memory system bottlenecks in programs. In ACM SIGMETRICS International Conference on Modeling of Computer Systems, pages 1–12, 1992.
M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In Measurement and Modeling of Computer Systems, pages 138–147, 1996.
Z. Radovic and E. Hagersten. Removing the overhead from software-based shared memory. In Proceedings of Supercomputing 2001, November 2001.
J. Seward. The cacheprof home page http://www.cacheprof.org/.
SPEC. Standard performance evaluation corporation http://www.spec.org/.
Sun. Stabs Interface Manual, ver.4.0. Sun Microsystems, Inc, Palo Alto, California, U.S.A., 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berg, E., Hagersten, E. (2002). SIP: Performance Tuning through Source Code Interdependence. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_22
Download citation
DOI: https://doi.org/10.1007/3-540-45706-2_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive