Skip to main content
Log in

Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection

  • Special Section on Advanced Computer Systems Architecture
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. August D I, Connors D A, Mahlke S A et al. Integrated predicated and speculative execution in the IMPACT EPIC architecture. In Proc. 25th Annual International Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.227–237.

  2. Intel Corp. Itanium Application Developers Architecture Guide. May 1999.

  3. Schlansker M S, Rau B R. EPIC: Explicitly parallel instruction computing. Computer, Feb. 2000, 33(2): 37–45.

    Article  Google Scholar 

  4. Ball Thomas, Larus James. Optimally profiling and tracing programs. ACM Trans. Programming Languages and Systems, July 1994, 16(3): 1319–1360.

    Article  Google Scholar 

  5. Ball Thomas, Larus James. Efficient path profiling. MICRO-29, Paris, France, Dec. 1996, pp.46–57.

  6. Anderson J, Berc L M, Dean J et al. Continuous profiling: Where have all the cycles gone?. In Proc. 16th Symposium on Operating System Principles, Oct. 1997, pp.1–4.

  7. Zhang Xiaolan, Wang Zheng, Gloy Nicholas et al. System support for automated profiling and optimization. In 16th ACM Symposium on Operating System Principles, Saint Malo, france, Oct. 5–8, 1997, pp.15–26.

  8. Diep Trung A, Christopher Neslson, John P Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proc. the 22nd Annual Int. Symp. Computer Architecture, Santa Margherita Ligure, Italy, June 1995, pp.163–174.

  9. Dean J, Hicks J E, Waldspurger C A et al. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proc. 30th Annual International Symposium on Microarchitecture, Research Triangle Park, North Carolina, Dec. 1997.

  10. Knuth D E, Stevenson F R. Optimal measurement of points for program frequency counts. BIT Numerical Mathematics, Kluwer Academic Publishers, B.V., 1973, 3(3): 313–322.

  11. Lee Yong-Fong, Barbara G Ryder. A comprehensive approach to parallel data flow analysis. In Proc. the ACM Int. Conf. Supercomputing, Washington DC, U.S.A., July 1992, pp.236–247.

  12. Pettis K, Hansen R C. Profile guided code positioning. In Proc. SIGPLAN 1990 Conf. Programming Language Design and Implementation, White Plain, NY, June 1990, pp.16–27.

  13. Smith M. Overcoming the challenges to feedback-directed optimization. In Proc. the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization, Boston, Jan. 18, 2000.

  14. Arnold Matthew, Barbara G Ryder. A framework for reducing the cost of instrumented code. In Proc. the ACM SIGPLAN'01 Conf. Programming Language Design and Implementation, Snowbird, Utah, United States, June 2001, pp.168–179.

  15. Hirzel M, Chilimbi T. Bursty tracing: A framework for low-overhead Temporal Profiling. In Workshop on Feed-back-Directed and Dynamic Optimizations (FDDO), Austin, Texas, 2001.

  16. Merten Matthew C, Andrew R Trick, Christopher N George et al. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proc. the 26th Int. Symp. Computer Architecture, Atlanta, GA, May 1999, pp.136–147.

  17. Merten M C, Trick A R, Nystrom E M et al. A hardware mechanism for dynamic extraction and relayout of program hot spots. In Proc. the 27th Int. Symp. Computer Architecture, Vancouver BC, 2000, pp.59–70.

  18. Conte T M, Petal B A, Cox J S. Using branch handling hardware to support profile-driven optimization. In Proc. 27th Annual Intl. Symp. Microarchitecture, Paris, France, Dec. 1996, pp.36–45.

  19. Conte T M, Menezes K N, Hirsh M A. Accurate and practical profile-driven compilation using the profile buffer. In Proc. 29th Annual Int. Symp. Microarchitecture, San Jose, U.S.A., Nov. 1994, pp.12–21.

  20. Ebcioglu K, Altman E, Gschwind M, Sathaye S. Dynamic binary translation and optimization. IEEE Trans. Computers, June 2001, 50(6): 529–548.

    Article  Google Scholar 

  21. Eichenberger A, Sheldon M Lobo. Efficient edge profiling for ILP-processor. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Paris, France, Oct. 1998, pp.294–303.

  22. Schnarr Eric, Larus James. Instruction scheduling and executable editing. In Proc. 29th Annual Int. Symp. Microarchitecture, Paris, France, Dec. 1996, pp.288–297.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youfeng Wu.

Additional information

Youfeng Wu received his B.S. degree from Fudan University and his M.S. and Ph.D. degrees from Oregon State University, in computer science. He is currently a principal engineer with Intel's Corporate Technology Group and manages a research team on multiprocessor compilation and dynamic binary optimizations. His research interests include parallel programming and transformations, multiprocessor architecture, binary and dynamic optimizations, and security and safety enhancement via compiler and binary tools.

Yong-Fong Lee received his M.S. and Ph.D. degrees from Rutgers University, both in computer science. He is currently a principal engineer with Intel's Software and Solutions Group and leading a team in working with key ISV's to optimize their solutions on Intel platforms. His technical interests include programming languages & compilers, computer architecture, and performance optimization of server applications.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Lee, YF. Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection. J Comput Sci Technol 20, 665–675 (2005). https://doi.org/10.1007/s11390-005-0665-1

Download citation

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-005-0665-1

Keywords

Navigation