Skip to main content
Log in

Characterizing and optimizing Java-based HPC applications on Intel many-core architecture

基于Intel众核架构的Java高性能计算研究与优化

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

The increasing demand for performance has stimulated the wide adoption of many-core accelerators like Intel® Xeon PhiTM Coprocessor, which is based on Intel’s Many Integrated Core architecture. While many HPC applications running in native mode have been tuned to run efficiently on Xeon Phi, it is still unclear how a managed runtime like JVM performs on such an architecture. In this paper, we present the first measurement study of a set of Java HPC applications on Xeon Phi under JVM. One key obstacle to the study is that there is currently little support of Java for Xeon Phi. This paper presents the result based on the first porting of OpenJDK platform to Xeon Phi, in which the HotSpot virtual machine acts as the kernel execution engine. The main difficulty includes the incompatibility between Xeon Phi ISA and the assembly library of Hotspot VM. By evaluating the multithreaded Java Grande benchmark suite and our ported Java Phoenix benchmarks, we quantitatively study the performance and scalability issues of JVM on Xeon Phi and draw several conclusions from the study. To fully utilize the vector computing capability and hide the significant memory access latency on the coprocessor, we present a semi-automatic vectorization scheme and software prefetching model in HotSpot. Together with 60 physical cores and tuning, our optimized JVM achieves averagely 2.7x and 3.5x speedup compared to Xeon CPU processor by using vectorization and prefetching accordingly. Our study also indicates that it is viable and potentially performance-beneficial to run applications written for such a managed runtime like JVM on Xeon Phi.

摘要

创新点

基于Intel集成众核架构(MIC)的Xeon Phi协处理器是近年来十分流行的一款众核产品, 而Java由于其优秀的平台移植性与日益提升的虚拟机性能也越来越多地被应用于高性能计算领域, 然而遗憾的是, Intel并未对Xeon Phi提供Java环境支持。 本文实现了首个针对Intel MIC平台的OpenJDK移植工作, 并成功地搭建了一个完整的Java运行时环境。 同时, 我们基于一系列计算密集型Java程序详细研究了MIC上的Java HPC性能吞吐量与可扩展性, 并针对其中存在的问题分别提出了一个半自动的向量化模型与数据预取解决方案。实验表明, 本文所提出的优化方案可以在MIC上带来显著的性能提升, 并同时论证了Intel众核平台在Java高性能计算领域拥有巨大的潜力。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chrysos G. Intel® Xeon PhiTM Coprocessor-the Architecture. Intel Whitepaper, 2014

    Google Scholar 

  2. Shafi A, Carpenter B, Baker M. Nested parallelism for multi-core HPC systems using Java. J Parall Distrib Comput, 2009, 69: 532–545

    Article  Google Scholar 

  3. Moreira J E, Midkiff S P, Gupta M, et al. NINJA: Java for high performance numerical computing. Sci Program, 2002, 10: 19–33

    MATH  Google Scholar 

  4. Amedro B, Bodnartchouk V, Caromel D, et al. Current state of Java for HPC. Technical Report RT-0353. INRIA, 2008

    Google Scholar 

  5. O’Mullane W, Luri X, Parsons P, et al. Using Java for distributed computing in the Gaia satellite data processing. Exp Astron, 2011, 31: 243–258

    Article  Google Scholar 

  6. Taboada G L, Touri˜no J, Doallo R. Java for high performance computing: assessment of current research and practice. In: Proceedings of the 7th International Conference on Principles and Practice of Programming in Java. New York: ACM, 2009. 30–39

    Google Scholar 

  7. Boisvert R F, Moreira J, Philippsen M, et al. Java and numerical computing. Comput Sci Eng, 2001, 3: 18–24

    Article  Google Scholar 

  8. Guide P. Intel R 64 and IA-32 Architectures Software Developer’s Manual. 2010

    Google Scholar 

  9. Blumofe R D, Joerg C F, Kuszmaul B C, et al. Cilk: an efficient multithreaded runtime system. In: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM, 1995. 207–216

    Google Scholar 

  10. Lindholm T, Yellin F, Bracha G, et al. The Java Virtual Machine Specification. 8th ed. Redwood City: Pearson Education, 2014

    Google Scholar 

  11. Intel. Intel ® Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual. 2012

  12. Smith L A, Bull J M, Obdrizalek J. A parallel Java grande benchmark suite. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. New York: ACM, 2001. 8

    Chapter  Google Scholar 

  13. Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of IEEE 13th International Symposium on High Performance Computer Architecture. Washington, DC: IEEE, 2007. 13–24

    Google Scholar 

  14. Fang Z, Mehta S, Yew P C, et al. Measuring microarchitectural details of multi-and many-core memory systems through microbenchmarking. ACM Trans Architect Code Optim, 2015, 11: 55

    Google Scholar 

  15. Intel. Intel® Xeon PhiTM Coprocessor System Software Developers Guide. 2013

  16. Mehta S, Fang Z, Zhai A, et al. Multi-stage coordinated prefetching for present-day processors. In: Proceedings of the 28th ACM International Conference on Supercomputing. New York: ACM, 2014. 73–82

    Google Scholar 

  17. Krishnaiyer R, Kultursay E, Chawla P, et al. Compiler-based data prefetching and streaming non-temporal store generation for the Intel® Xeon PhiTM coprocessor. In: Proceedings of IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), Cambridge, 2013. 1575–1586

    Google Scholar 

  18. Wurthinger T, Wimmer C, Mossenbock H. Visualization of program dependence graphs. In: Proceedings of the Joint European Conferences on Theory and Practice of Software and the 17th International Conference on Compiler Construction. Berlin/Heidelberg: Springer-Verlag, 2008. 193–196

    Google Scholar 

  19. Tuck J, Ceze L, Torrellas J. Scalable cache miss handling for high memory-level parallelism. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC: IEEE, 2006. 409–422

    Google Scholar 

  20. Fang J, Varbanescu A L, Sips H, et al. An empirical study of Intel Xeon Phi. arXiv:1310.5842

  21. Ramachandran A, Vienne J, van der Wijngaart R, et al. Performance evaluation of NAS parallel benchmarks on Intel Xeon Phi. In: Proceedings of the 42nd International Conference on Parallel Processing, Lyon, 2013. 736–743

    Google Scholar 

  22. Heinecke A, Vaidyanathan K, Smelyanskiy M, et al. Design and implementation of the linpack benchmark for single and multi-node systems based on Intel R Xeon PhiTM coprocessor. In: Proceedings of IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), Boston, 2013. 126–137

    Google Scholar 

  23. Eyerman S, Eeckhout L. The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism. ACM SIGARCH Comput Architect News, 2014, 42: 591–606

    Google Scholar 

  24. Chen K Y, Chang J M, Hou T W. Multithreading in Java: performance and scalability on multicore systems. IEEE Trans Comput, 2011, 60: 1521–1534

    Article  MathSciNet  Google Scholar 

  25. Gidra L, Thomas G, Sopena J, et al. A study of the scalability of stop-the-world garbage collectors on multicores. In: Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2013. 229–240

    Google Scholar 

  26. Yan Y H, Grossman M, Sarkar V. JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Proceedings of the 15th International Euro-Par Conference on Parallel Processing. Berlin/Heidelberg: Springer-Verlag, 2009. 887–899

    Google Scholar 

  27. Docampo J, Ramos S, Taboada G L, et al. Evaluation of Java for general purpose GPU computing. In: Proceedings of the 27th International Conference on Advanced Information Networking and Applications Workshops. Washington, DC: IEEE, 2013. 1398–1404

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binyu Zang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Lei, T., Chen, H. et al. Characterizing and optimizing Java-based HPC applications on Intel many-core architecture. Sci. China Inf. Sci. 60, 122106 (2017). https://doi.org/10.1007/s11432-015-0989-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-015-0989-3

Keywords

关键词

Navigation