ABSTRACT
It is important to analyze and predict meteorological phenomena in real-time. Parallel programming by exploiting thousands of threads in GPUs can be efficiently used to speed up the execution of many applications. However, GPUs have limitations when used for processing big data, which can be better analyzed using distributed computing platforms such as Hadoop and Spark. In this paper, we propose DAMB a system that processes streamed data on a heterogeneous cluster of CPUs and GPUs in real-time. The core of DAMB is SparkGPU, a platform that extends Apache Spark to allow it to manage a heterogeneous cluster that has both CPUs and GPUs and to execute tasks on GPUs. DAMB also provides data visualization tools that present the analyzed data in an interactive way in real-time. As a case study, we focus on a meteorological application that analyzes lightening discharges. We show that DAMB can successfully process and analyze the meteorological data streamed to it and visualize the results in real-time on a cluster of size 12 nodes, each is equipped with one or more GPU cards. This is a speedup of two orders of magnitude as compared to a sequential program implementation for the same application.
- Alluxio: Open Source Memory Speed Virtual Distributed Storage. Available at: http://www.alluxio.org/.Google Scholar
- Apache Hadoop. Available at: http://hadoop.apache.org/.Google Scholar
- Apache Kafka. Available at: http://kafka.apache.org/.Google Scholar
- Apache Spark. Available at: https://spark.apache.org/.Google Scholar
- Lightning Costs and Losses from Attributed Sources. Available at: http://lightningsafety.com/nlsi_llsnlsi_annual_usa_losses.htm.Google Scholar
- J. Chen, Y. Wu, and Z. Zhao. The new lightning detection system in China: Its method and performance. In Asia-Pacific Int. Symp. on Electromagnetic Compatibility, pages 1138--1141, 2010.Google ScholarCross Ref
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. USENIX Conf. on Operating Systems Design and Implementation (OSDI), pages 137--150, 2004. Google ScholarDigital Library
- L. Elbaghdady, M. Akita, Z. Kawasaki, and M. Ragab. One site three dimensions lightning location system using VHF broadband interferometers. Journal of Atmospheric electricity, 33(2):91--105, 2013.Google ScholarCross Ref
- M. W. Govett, J. Middlecoff, and T. Henderson. Running the NIM next-generation weather model on GPUs. In Proc. IEEE/ACM Int. Conf. on Cluster, Cloud and Grid Computing (CCGrid), pages 792--796, 2010. Google ScholarDigital Library
- A. Gulyás and I. Kiss. The use of low-cost, efficient GPU-based parallel computing in lightning modelling. Electric Power Systems Research, 113:41--47, 2014.Google ScholarCross Ref
- Z. Kawasaki, M. Stock, T. Ushio, and M. Stanley. Lightning imaging via VHF emission. In AGU Fall Meeting, 2015.Google Scholar
- H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proc. ACM Symp. on Cloud Computing (SoCC), pages 1--15, 2014. Google ScholarDigital Library
- P. Li, Y. Luo, N. Zhang, and Y. Cao. HeteroSpark: A heterogeneous CPU/GPU Spark platform for machine learning algorithms. In IEEE Int. Conf. on Networking, Architecture and Storage (NAS), pages 347--348, Aug 2015.Google Scholar
- J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. Parallel Processing Letters, 18(04):531--548, 2008.Google ScholarCross Ref
- S. Otsuka, G. Tuerhong, R. Kikuchi, Y. Kitano, Y. Taniguchi, J. J. Ruiz, S. Satoh, T. Ushio, and T. Miyoshi. Precipitation nowcasting with three-dimensional space-time extrapolation of dense and frequent phased-array weather radar observations. Weather and Forecasting, 31(1):329--340, 2016.Google ScholarCross Ref
- G. Pyrialakos, T. Zygiridis, N. Kantartzis, and T. Tsiboukis. GPU-based three-dimensional calculation of lightning-generated electromagnetic fields. In Int. Conf. on Numerical Electromagnetic Modeling and Optimization for RF, Microwave, and Terahertz Applications (NEMO), pages 1--4, 2014.Google ScholarCross Ref
- L. Samy, Y. Nakamura, A. Allam, T. Ushio, and Z. Kawasaki. Ten minutes continuous recording lightning using broadband VHF interferometer. Advances in Space Research, 56(10):2218--2234, 2015.Google ScholarCross Ref
- O. Segal, P. Colangelo, N. Nasiri, Z. Qian, and M. Margala. SparkCL: A unified programming framework for accelerators on heterogeneous clusters. arXiv preprint arXiv:1505.01120, 2015.Google Scholar
- X. Shao, D. Holden, and C. Rhodes. Broad band radio interferometry for lightning observations. Geophysical Research Letters, 23(15):1917--1920, 1996.Google ScholarCross Ref
- M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, W.-m. Hwu, et al. QP: a heterogeneous multi-accelerator cluster. In LCI Int. Conf. on High-Performance Clustered Computing, 2009.Google Scholar
- W. Vanderbauwhede and T. Takemi. An investigation into the feasibility and benefits of GPU/multicore acceleration of the weather research and forecasting model. In Int. Conf. on High Performance Computing and Simulation (HPCS), pages 482--489, 2013.Google ScholarCross Ref
- Y. Wei, A. V. Newman, G. P. Hayes, V. V. Titov, and L. Tang. Tsunami forecast by joint inversion of real-time tsunami waveforms and seismic or GPS data: Application to the tohoku 2011 tsunami. Pure and Applied Geophysics, 171(12):3281--3305, 2014.Google ScholarCross Ref
- E. Yoshikawa, T. Ushio, Z. Kawasaki, S. Yoshida, T. Morimoto, F. Mizutani, and M. Wada. MMSE beam forming on fast-scanning phased array weather radar. IEEE Transactions on Geoscience and Remote Sensing, 51(5):3077--3088, 2013.Google ScholarCross Ref
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Conf. on Networked Systems Design and Implementation (NSDI), 2012. Google ScholarDigital Library
- M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proc. ACM Symp. on Operating Systems Principles (SOSP), pages 423--438, 2013. Google ScholarDigital Library
- Y. Zheng, F. Liu, and H.-P. Hsieh. U-Air: When urban air quality inference meets big data. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), pages 1436--1444, 2013. Google ScholarDigital Library
Index Terms
- A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study
Recommendations
A scalable framework for heterogeneous GPU-based clusters
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architecturesGPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all ...
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster ComputingIn this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...
Heterogeneous parallel_for Template for CPU---GPU Chips
Heterogeneous processors, comprising CPU cores and a GPU, are the de facto standard in desktop and mobile platforms. In many cases it is worthwhile to exploit both the CPU and GPU simultaneously. However, the workload distribution poses a challenge when ...
Comments