skip to main content
10.1145/1250734.1250777acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Online optimizations driven by hardware performance monitoring

Published: 10 June 2007 Publication History

Abstract

Hardware performance monitors provide detailed direct feedback about application behavior and are an additional source of infor-mation that a compiler may use for optimization. A JIT compiler is in a good position to make use of such information because it is running on the same platform as the user applications. As hardware platforms become more and more complex, it becomes more and more difficult to model their behavior. Profile information that captures general program properties (like execution frequency of methods or basic blocks) may be useful, but does not capture sufficient information about the execution platform. Machine-level performance data obtained from a hardware performance monitor can not only direct the compiler to those parts of the program that deserve its attention but also determine if an optimization step actually improved the performance of the application.
This paper presents an infrastructure based on a dynamic compiler+runtime environment for Java that incorporates machine-level information as an additional kind of feedback for the compiler and runtime environment. The low-overhead monitoring system provides fine-grained performance data that can be tracked back to individual Java bytecode instructions. As an example, the paper presents results for object co-allocation in a generational garbage collector that optimizes spatial locality of objects on-line using measurements about cache misses. In the best case, the execution time is reduced by 14% and L1 cache misses by 28%.

References

[1]
Perfmon project. http://www.hpl.hp.com/research/linux/perfmon/.
[2]
IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. 2005.
[3]
A.-R. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. Prefetch injection based on hardware monitoring and object metadata. In Proc. of the ACM Conf. on Programming Language Design and Implementation (PLDI 2004), pages 267--276, New York, NY, USA, 2004. ACM Press.
[4]
B. Alpern, C. R. Attanasio, J. J. Barton, A. Cocchi, S. F. Hummel, D. Lieber, T. Ngo, M. F. Mergen, J. C. Shepherd, and S. Smith. Implementing Jalapeno in Java. In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPLSA 1999), pages 314--324, 1999.
[5]
B. Alpern, D. Attanasio, J. Barton, M. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, T. Ngo, M. Mergen, V. Sarkar, M. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM Systems Journal, Java Performance Issue, 39(1), 2000.
[6]
A. W. Appel. Simple generational garbage collection and fast allocation. Softw. Pract. Exper., 19(2):171--183, 1989.
[7]
M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2000), pages 47--65, New York, 2000. ACM Press.
[8]
M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of java. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 111--129, New York, USA, 2002. ACM Press.
[9]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: the performance impact of garbage collection. In SIGMETRICS 2004/PERFORMANCE 2004: Proceedings of the joint international conference on Measurement and modeling of computer systems, pages 25--36, New York, NY, USA, 2004. ACM Press.
[10]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? high performance garbage collection in java with mmtk. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 137--146. IEEE Computer Society, 2004.
[11]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. of the Conf. on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2006), New York, Oct. 2006. ACM Press.
[12]
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 21(12):1301--1321, Dec 1991.
[13]
T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Procof the ACM SIGPLAN'99 Conf. on Programming Language Design and Implementation (PLDI 1999), pages 13--24, New York, NY, USA, 1999. ACM Press.
[14]
M. Cierniak, G.-Y. Lueh, and J. M. Stichnoth. Practicing judo: Java under dynamic optimizations. In Procof the ACM Conf on Programming Language Design and Implementation (PLDI 2000), pages 13--26, New York, NY, USA, 2000. ACM Press.
[15]
A. Georges, D. Buytaert, L. Eeckhout, and K. D. Bosschere. Method-level phase behavior in java workloads. In Proc. of the ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 270--287, New York, NY, USA, 2004. ACM Press.
[16]
M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proc. of Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 251--269, New York, NY, USA, 2004. ACM Press.
[17]
X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Procof the ACM Confon Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 69--80, New York, NY, USA, 2004. ACM Press.
[18]
X. Huang, B. T. Lewis, and K. S. McKinley. Dynamic code management: Improving whole program code locality in managed runtimes. In VEE '06: Proc. of the second international Conf. on Virtual Execution Environments, pages 133--143, New York, USA, 2006. ACM Press.
[19]
T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. ACM Trans. Program. Lang. Syst., 22(3):490--505, 2000.
[20]
J. Lau, M. Arnold, M. Hind, and B. Calder. Online performance auditing: Using hot optimizations without getting burned. In Proc. Conf. on Programming Language Design and Implementation (PLDI 2006), pages 239--251, New York, USA, 2006. ACM Press.
[21]
K. Pettis and R. Hansen. Profile guided code positioning. In Proc. ACM SIGPLAN'90 Conf. on Prog. Language Design and Implementation, pages 16--27, White Plains, N.Y., June 1990. ACM.
[22]
S. Rubin, R. Bodik, and T. Chilimbi. An efficient Profile-Analysis framework for data-layout optimizations. In Procof the Sympon Principles Of Programming Languages (POPL 2002), pages 140--153, New York, NY, USA, 2002. ACM Press.
[23]
F. Schneider and T. Gross. Using platform-specific performance counters for dynamic compilation. In Proc. of the International Workshop on Compilers for Parallel Computing (LCPC 2005), Oct. 2005.
[24]
Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and preserving locality of java applications at allocation and garbage collection times. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 13--25, New York, 2002. ACM Press.
[25]
D. Siegwart and M. Hirzel. Improving locality with parallel hierarchical copying gc. In Proceedings of the 2006 International Symposium on Memory Management (ISMM 2006), pages 52--63, New York, USA, 2006. ACM Press.
[26]
B. Sprunt. Pentium 4 performance monitoring features. In IEEE Micro, pages 72--82, July-August 2002.
[27]
T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a java just-in-time compiler. In Proc. of the ACM Conf. on Object Oriented Programming, Systems, Languages, and Applications (OOPLSA 2001), pages 180--195, New York, NY, USA, 2001. ACM Press.
[28]
The Standard Performance Evaluation Corporation. SPEC JBB2000 Benchmark. http://www.spec.org/jbb2000/.
[29]
The Standard Performance Evaluation Corporation. SPEC JVM98 Benchmarks. http://www.spec.org/osg/jvm98, 1996.
[30]
D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In Proc. of the Software Engineering Symposium on Practical Software Development Environments (SDE 1), pages 157--167, New York, USA, 1984. ACM Press.

Cited By

View all
  • (2018)Bayonet: probabilistic inference for networksACM SIGPLAN Notices10.1145/3296979.319240053:4(586-602)Online publication date: 11-Jun-2018
  • (2017)SmartGC: Online Memory Management Prediction for PaaS Cloud ModelsOn the Move to Meaningful Internet Systems. OTM 2017 Conferences10.1007/978-3-319-69462-7_25(370-388)Online publication date: 20-Oct-2017
  • (2016)Efficient Management for Hybrid Memory in Managed Language RuntimeNetwork and Parallel Computing10.1007/978-3-319-47099-3_3(29-42)Online publication date: 30-Sep-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2007
508 pages
ISBN:9781595936332
DOI:10.1145/1250734
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 42, Issue 6
    Proceedings of the 2007 PLDI conference
    June 2007
    491 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1273442
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java
  2. dynamic optimization
  3. hardware performance monitors
  4. just-in-time compilation

Qualifiers

  • Article

Conference

PLDI '07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Bayonet: probabilistic inference for networksACM SIGPLAN Notices10.1145/3296979.319240053:4(586-602)Online publication date: 11-Jun-2018
  • (2017)SmartGC: Online Memory Management Prediction for PaaS Cloud ModelsOn the Move to Meaningful Internet Systems. OTM 2017 Conferences10.1007/978-3-319-69462-7_25(370-388)Online publication date: 20-Oct-2017
  • (2016)Efficient Management for Hybrid Memory in Managed Language RuntimeNetwork and Parallel Computing10.1007/978-3-319-47099-3_3(29-42)Online publication date: 30-Sep-2016
  • (2014)Efficient code management for dynamic multi-tiered compilation systemsProceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform: Virtual machines, Languages, and Tools10.1145/2647508.2647513(51-62)Online publication date: 23-Sep-2014
  • (2013)Detection of false sharing using machine learningProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503269(1-9)Online publication date: 17-Nov-2013
  • (2013)Trace construction using enhanced performance monitoringProceedings of the ACM International Conference on Computing Frontiers10.1145/2482767.2482811(1-10)Online publication date: 14-May-2013
  • (2013)Taming Hardware Event Samples for Precise and Versatile Feedback Directed OptimizationsIEEE Transactions on Computers10.1109/TC.2011.23362:2(376-389)Online publication date: 1-Feb-2013
  • (2012)Identifying the sources of cache misses in Java programs without relying on hardware countersACM SIGPLAN Notices10.1145/2426642.225901447:11(133-142)Online publication date: 15-Jun-2012
  • (2012)Characterizing continuous time random walks on time varying graphsACM SIGMETRICS Performance Evaluation Review10.1145/2318857.225479440:1(307-318)Online publication date: 11-Jun-2012
  • (2012)Providing fairness on shared-memory multiprocessors via process schedulingACM SIGMETRICS Performance Evaluation Review10.1145/2318857.225479240:1(295-306)Online publication date: 11-Jun-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media