Abstract
The fork-join model for parallel computing has become very popular and is included in the Java class library since Java 7. While understanding and optimizing the performance of fork-join computations is of paramount importance, accurately profiling them on the Java Virtual Machine (JVM) is challenging due to the complexity of the API. In this paper, we present a novel model for analyzing fork-join computations on the JVM, addressing the peculiarities of the Java fork-join framework, including features such as task unforking and task reuse. We implement our model in a profiler that detects every spawned fork-join task, capturing all task dependencies and aiming at collecting cycle-accurate task-granularity data. We evaluate our profiler against a dedicated fork-join profiler for the JVM, showing that our tool achieves higher profile accuracy and introduces less overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Reference cycles are the clock cycles elapsed during an operation, collected at the nominal processor frequency (regardless of frequency scaling). The paper uses the term “cycles” to indicate “reference cycles” for short.
- 2.
\(\mathsf{FJProf}\) is a fork-join-specific version of the task-granularity profiler tgp [24].
- 3.
Average overheads and accuracies across multiple workloads are computed using the geometric mean.
References
Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurrency Comput. Pract. Exp. 22(6), 685–701 (2010). https://doi.org/10.1002/cpe.1553
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996). https://doi.org/10.1145/209936.209958
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999). https://doi.org/10.1145/324133.324234
Chen, S., et al.: Scheduling threads for constructive cache sharing on CMPs. In: SPAA, pp. 105–115 (2007). https://doi.org/10.1145/1248377.1248396
Conway, M.E.: A multiprocessor system design. In: AFIPS, pp. 139–146 (1963). https://doi.org/10.1145/1463822.1463838
Fonseca, A., Cabral, B.: Evaluation of runtime cut-off approaches for parallel programs. In: VECPAR, pp. 121–134 (2016). https://doi.org/10.1007/978-3-319-61982-8_13
Fonseca, A., Stork, S.: AeminiumBenchmarks (2016). https://github.com/AEminium/AeminiumBenchmarks
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998). https://doi.org/10.1145/277650.277725
Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS, pp. 1–12 (2009). https://doi.org/10.1109/IPDPS.2009.5161079
Haller, P., Tu, S.: The Scala Actors API (2022). https://docs.scala-lang.org/overviews/core/actors.html
He, Y., Leiserson, C.E., Leiserson, W.M.: The Cilkview scalability analyzer. In: SPAA, pp. 145–156 (2010). https://doi.org/10.1145/1810479.1810509
ICL: PAPI (2021). http://icl.utk.edu/papi
Lea, D.: A Java Fork/Join framework. In: JAVA, pp. 36–43 (2000). https://doi.org/10.1145/337449.337465
Lifflander, J., Krishnamoorthy, S., Kale, L.V.: Steal tree: low-overhead tracing of work stealing schedulers. In: PLDI, pp. 507–518 (2013). https://doi.org/10.1145/2499370.2462193
Marek, L., et al.: ShadowVM: robust and comprehensive dynamic program analysis for the Java platform. ACM SIGPLAN Not. 49(3), 105–114 (2013). https://doi.org/10.1145/2517208.2517219
Marek, L., Villazón, A., Zheng, Y., Ansaloni, D., Binder, W., Qi, Z.: DiSL: a domain-specific language for bytecode instrumentation. In: AOSD, pp. 239–250 (2012). https://doi.org/10.1145/2162049.2162077
Mohr, B., Brown, D., Malony, A.: TAU: a portable parallel program analysis environment for pC++. In: CONPAR – VAPP VI, pp. 29–40 (1994). https://doi.org/10.1007/3-540-58430-7_4
Nyman, L., Laakso, M.: Notes on the history of Fork and Join. IEEE Ann. Hist. Comput. 38(3), 84–87 (2016). https://doi.org/10.1109/MAHC.2016.34
Oracle: Package java.util.stream (2021). https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html
Oracle: ForkJoinTask (2022). https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ForkJoinTask.html
Prokopec, A., et al.: Renaissance: benchmarking suite for parallel applications on the JVM. In: PLDI, pp. 31–47 (2019). https://doi.org/10.1145/3314221.3314637
Renaissance Suite: Documentation Overview. https://renaissance.dev/docs (2019)
Rosà, A., Rosales, E., Binder, W.: Analysis and optimization of task granularity on the java virtual machine. ACM Trans. Program. Lang. Syst. 41(3) (2019). https://doi.org/10.1145/3338497
Rosà A.: TGP (2022). https://github.com/fithos/tgp
Rosales, E., Rosà, A., Binder, W.: FJProf: profiling Fork/Join applications on the Java Virtual Machine. In: VALUETOOLS, pp. 128–135 (2020). https://doi.org/10.1145/3388831.3388851
Rosà, A., Binder, W.: Optimizing type-specific instrumentation on the JVM with reflective supertype information. J. Vis. Lang. Comput. 49, 29–45 (2018). https://doi.org/10.1016/j.jvlc.2018.10.007
Schardl, T.B., Kuszmaul, B.C., Lee, I.T.A., Leiserson, W.M., Leiserson, C.E.: The Cilkprof scalability profiler. In: SPAA, pp. 89–100 (2015). https://doi.org/10.1145/2755573.2755603
Tallent, N.R., Mellor-Crummey, J.M.: Identifying performance bottlenecks in work-stealing computations. Computer 42(12), 44–50 (2009). https://doi.org/10.1109/MC.2009.396
Teng, Q.M., Wang, H.C., Xiao, Z., Sweeney, P.F., Duesterwald, E.: THOR: a performance analysis tool for Java applications running on multicore systems. IBM J. Res. Dev. 54(5), 4:1–4:17 (2010). https://doi.org/10.1147/JRD.2010.2058481
The Clojure Team: Reducers (2019). https://clojure.org/reference/reducers
The GPars Team: GPars - A Concurrency & Parallelism Framework for Groovy and Java (2016). http://www.gpars.org
Acknowledgments
This work has been supported by Oracle (ERO project 1332) and by the Swiss National Science Foundation (project 200020_188688).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Basso, M., Rosales, E., Schiavio, F., Rosà, A., Binder, W. (2022). Accurate Fork-Join Profiling on the Java Virtual Machine. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-12597-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12596-6
Online ISBN: 978-3-031-12597-3
eBook Packages: Computer ScienceComputer Science (R0)