skip to main content
10.1145/3006299.3006304acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study

Published:06 December 2016Publication History

ABSTRACT

It is important to analyze and predict meteorological phenomena in real-time. Parallel programming by exploiting thousands of threads in GPUs can be efficiently used to speed up the execution of many applications. However, GPUs have limitations when used for processing big data, which can be better analyzed using distributed computing platforms such as Hadoop and Spark. In this paper, we propose DAMB a system that processes streamed data on a heterogeneous cluster of CPUs and GPUs in real-time. The core of DAMB is SparkGPU, a platform that extends Apache Spark to allow it to manage a heterogeneous cluster that has both CPUs and GPUs and to execute tasks on GPUs. DAMB also provides data visualization tools that present the analyzed data in an interactive way in real-time. As a case study, we focus on a meteorological application that analyzes lightening discharges. We show that DAMB can successfully process and analyze the meteorological data streamed to it and visualize the results in real-time on a cluster of size 12 nodes, each is equipped with one or more GPU cards. This is a speedup of two orders of magnitude as compared to a sequential program implementation for the same application.

References

  1. Alluxio: Open Source Memory Speed Virtual Distributed Storage. Available at: http://www.alluxio.org/.Google ScholarGoogle Scholar
  2. Apache Hadoop. Available at: http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. Apache Kafka. Available at: http://kafka.apache.org/.Google ScholarGoogle Scholar
  4. Apache Spark. Available at: https://spark.apache.org/.Google ScholarGoogle Scholar
  5. Lightning Costs and Losses from Attributed Sources. Available at: http://lightningsafety.com/nlsi_llsnlsi_annual_usa_losses.htm.Google ScholarGoogle Scholar
  6. J. Chen, Y. Wu, and Z. Zhao. The new lightning detection system in China: Its method and performance. In Asia-Pacific Int. Symp. on Electromagnetic Compatibility, pages 1138--1141, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. USENIX Conf. on Operating Systems Design and Implementation (OSDI), pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Elbaghdady, M. Akita, Z. Kawasaki, and M. Ragab. One site three dimensions lightning location system using VHF broadband interferometers. Journal of Atmospheric electricity, 33(2):91--105, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. W. Govett, J. Middlecoff, and T. Henderson. Running the NIM next-generation weather model on GPUs. In Proc. IEEE/ACM Int. Conf. on Cluster, Cloud and Grid Computing (CCGrid), pages 792--796, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gulyás and I. Kiss. The use of low-cost, efficient GPU-based parallel computing in lightning modelling. Electric Power Systems Research, 113:41--47, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. Z. Kawasaki, M. Stock, T. Ushio, and M. Stanley. Lightning imaging via VHF emission. In AGU Fall Meeting, 2015.Google ScholarGoogle Scholar
  12. H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proc. ACM Symp. on Cloud Computing (SoCC), pages 1--15, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Li, Y. Luo, N. Zhang, and Y. Cao. HeteroSpark: A heterogeneous CPU/GPU Spark platform for machine learning algorithms. In IEEE Int. Conf. on Networking, Architecture and Storage (NAS), pages 347--348, Aug 2015.Google ScholarGoogle Scholar
  14. J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. Parallel Processing Letters, 18(04):531--548, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Otsuka, G. Tuerhong, R. Kikuchi, Y. Kitano, Y. Taniguchi, J. J. Ruiz, S. Satoh, T. Ushio, and T. Miyoshi. Precipitation nowcasting with three-dimensional space-time extrapolation of dense and frequent phased-array weather radar observations. Weather and Forecasting, 31(1):329--340, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. G. Pyrialakos, T. Zygiridis, N. Kantartzis, and T. Tsiboukis. GPU-based three-dimensional calculation of lightning-generated electromagnetic fields. In Int. Conf. on Numerical Electromagnetic Modeling and Optimization for RF, Microwave, and Terahertz Applications (NEMO), pages 1--4, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. L. Samy, Y. Nakamura, A. Allam, T. Ushio, and Z. Kawasaki. Ten minutes continuous recording lightning using broadband VHF interferometer. Advances in Space Research, 56(10):2218--2234, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. O. Segal, P. Colangelo, N. Nasiri, Z. Qian, and M. Margala. SparkCL: A unified programming framework for accelerators on heterogeneous clusters. arXiv preprint arXiv:1505.01120, 2015.Google ScholarGoogle Scholar
  19. X. Shao, D. Holden, and C. Rhodes. Broad band radio interferometry for lightning observations. Geophysical Research Letters, 23(15):1917--1920, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, W.-m. Hwu, et al. QP: a heterogeneous multi-accelerator cluster. In LCI Int. Conf. on High-Performance Clustered Computing, 2009.Google ScholarGoogle Scholar
  21. W. Vanderbauwhede and T. Takemi. An investigation into the feasibility and benefits of GPU/multicore acceleration of the weather research and forecasting model. In Int. Conf. on High Performance Computing and Simulation (HPCS), pages 482--489, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. Y. Wei, A. V. Newman, G. P. Hayes, V. V. Titov, and L. Tang. Tsunami forecast by joint inversion of real-time tsunami waveforms and seismic or GPS data: Application to the tohoku 2011 tsunami. Pure and Applied Geophysics, 171(12):3281--3305, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Yoshikawa, T. Ushio, Z. Kawasaki, S. Yoshida, T. Morimoto, F. Mizutani, and M. Wada. MMSE beam forming on fast-scanning phased array weather radar. IEEE Transactions on Geoscience and Remote Sensing, 51(5):3077--3088, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Conf. on Networked Systems Design and Implementation (NSDI), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proc. ACM Symp. on Operating Systems Principles (SOSP), pages 423--438, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zheng, F. Liu, and H.-P. Hsieh. U-Air: When urban air quality inference meets big data. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), pages 1436--1444, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
        December 2016
        373 pages
        ISBN:9781450346177
        DOI:10.1145/3006299

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 December 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate27of93submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader