Abstract
Advancements in the field of big data have led into an increasing interest in accelerator-based computing as a solution for computationally intensive problems. However, many prevalent big data frameworks are built and run on top of the Java Virtual Machine (JVM), which does not explicitly offer support for accelerated computing with e.g. GPGPU or FPGA. One major challenge in combining JVM-based big data frameworks with accelerators is transferring data from objects that reside in JVM managed memory to the accelerator. In this paper, a rigorous analysis of possible solutions is presented to address this challenge. Furthermore, a tool is presented which generates the required code for four alternative solutions and measures the attainable data transfer speed, given a specific object graph. This can give researchers and designers a fast insight about whether the interface between JVM and accelerator can saturate the computational resources of their accelerator. The benchmarking tool was run on a POWER8 system, for which results show that depending on the size of the objects and collections size, an approach based on the Java Native Interface can achieve between 0.9 and 12 GB/s, ByteBuffers can achieve between 0.7 and 3.3 GB/s, the Unsafe library can achieve between 0.8 and 16 GB/s and finally an approach access the data directly can achieve between 3 and 67 GB/s. From our measurements, we conclude that the HotSpot VM does not yet have standardized interfaces by design that can saturate common bandwidths to accelerators seen today or in the future, although one of the approaches presented in this paper can overcome this limitation.
Notes
- 1.
this depends on whether the representation of the array in the VM is the same as the native representation, and if the VM garbage collector supports “pinning”.
References
Anderson, M., Smith, S., Sundaram, N., Capota, M., Zhao, Z., Dulloor, S., Satish, N., Willke, T.L.: Bridging the gap between HPC and big data frameworks. Proc. VLDB Endow. 10(8) (2017)
Bytedeco: JavaCPP, April 2017, https://github.com/bytedeco/javacpp
Chen, Y.T., Cong, J., Fang, Z., Lei, J., Wei, P.: When apache spark meets FPGAs: a case study for next-generation DNA sequencing acceleration. In: The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2016) (2016)
Chen, Z.N., Chen, K., Jiang, J.L., Zhang, L.F., Wu, S., Qi, Z.W., Hu, C.M., Wu, Y.W., Sun, Y.Z., Tang, H., et al.: Evolution of cloud operating system: from technology to ecosystem. J. Comput. Sci. Technol. 32(2), 224–241 (2017)
Databricks: TensorFrames: Experimental tensorflow binding for Scala and Apache Spark, April 2017, https://github.com/databricks/tensorframes
Esmaeilzadeh, H., Blem, E., St Amant, R., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. In: ACM SIGARCH Computer Architecture News, vol. 39, pp. 365–376. ACM (2011)
Ghasemi, E., Chow, P.: Accelerating apache spark big data analysis with FPGAs. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), p. 94, May 2016
Gouy, I.: The computer language benchmarks game, 20 March (2017), http://benchmarksgame.alioth.debian.org/
Huang, M., Wu, D., Yu, C.H., Fang, Z., Interlandi, M., Condie, T., Cong, J.: Programming and runtime support to Blaze FPGA accelerator deployment at datacenter scale. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, pp. 456–469. ACM (2016)
Lindholm, T., Yellin, F., Bracha, G., Buckley, A.: The Java Virtual Machine Specification, Java SE, 8th edn. Oracle (2015)
Open-source project: Java Native Access, April 2017, https://github.com/java-native-access/jna
Oracle: Java HotSpot virtual machine performance enhancements, April 2017, http://docs.oracle.com/javase/8/docs/technotes/guides/vm/performance-enhancements-7.html
Oracle: Object serialization stream protocol, April 2017, https://docs.oracle.com/javase/8/docs/platform/serialization/spec/serialTOC.html
Peltenburg, J.: JVM-to-Accelerator Benchmark Tool, https://github.com/johanpel/jvm2accbench
Stuecheli, J., Blaner, B., Johns, C., Siegel, M.: CAPI: a coherent accelerator processor interface. IBM J. Res. Dev. 59(1), 1–7 (2015)
Weiss, P.: Off heap memory access for non-jvm libraries, March 2017, https://issues.apache.org/jira/browse/SPARK-10399
Yuan, Y., Salmi, M.F., Huai, Y., Wang, K., Lee, R., Zhang, X.: Spark-GPU: an accelerated in-memory data processing engine on clusters. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 273–283, December 2016
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Acknowledgment
The authors would like to thank Erik Vermij for his help using the POWER8 system and the Texas Advanced Computing Center and their partners for access to the hardware. This work was supported by the European Commission in the context of the ARTEMIS project ALMARVI (project #621439).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Peltenburg, J., Hesam, A., Al-Ars, Z. (2017). Pushing Big Data into Accelerators: Can the JVM Saturate Our Hardware?. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-67630-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)