Skip to main content

An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation

  • Conference paper
  • First Online:

Abstract

As the growth of cluster scale, huge power consumption will be a major bottleneck for future large-scale high performance cluster. However, most existing cloud-clusters are based on power-hungry X86-64 which merely aims to common enterprise applications. In this paper, we improve the cluster performance by leveraging ARM SoCs which feature energy-efficient. In our prototype, cluster with five Cubieboard4, we run HPL and achieve 9.025 GFLOPS which exhibits a great computational potential. Moreover, we build our measurement model and conduct extensive evaluation by comparing the performance of the cluster with WordCount, k-Means (etc.) running in Map-Reduce mode and Spark mode respectively. The experiment results demonstrate that our cluster can guarantee higher computational efficiency on compute-intensive utilities with the RDD feature of Spark. Finally, we propose a more suitable theoretical hybrid architecture of future cloud clusters with a stronger master and customized ARMv8 based TaskTrackers for data-intensive computing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. GöDdeke, D., Komatitsch, D., Geveler, M., et al.: Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. J. Comput. Phys. 237, 132–150 (2013)

    Article  Google Scholar 

  2. Rajovic, N., Rico, A., Puzovic, N., et al.: Tibidabo: making the case for an ARM-based HPC system. Future Gener. Comput. Syst. 36, 322–334 (2014)

    Article  Google Scholar 

  3. www.top500.rog

  4. www.green500.org

  5. Ebrahimi, K., Jones, G.F., Fleischer, A.S.: A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities. Renew. Sustain. Energy Rev. 31, 622–638 (2014)

    Article  Google Scholar 

  6. Turley, J.: Cortex-A15 “Eagle” flies the coop. Microprocess. Rep. 24(11), 1–11 (2010)

    Google Scholar 

  7. ARM Ltd.: Cortex-A50 series. http://www.arm.com

  8. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003). ACM

    Article  Google Scholar 

  9. Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)

    Article  Google Scholar 

  12. Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10. IEEE (2010)

    Google Scholar 

  13. Zaharia, M., Konwinski, A., Joseph, A.D., et al.: Improving MapReduce performance in heterogeneous environments. In: OSDI, 8(4), p. 7 (2008)

    Google Scholar 

  14. Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)

    Google Scholar 

  15. Fox, K., Mongan, W., Popyack, J.: Raspberry HadooPI: a low-cost, hands-on laboratory in big data and analytics. In: Proceedings of the 46th ACM Technical Symposium on Computer Science Education, p.687. ACM (2015)

    Google Scholar 

  16. Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power Hadoop cluster. In: 2014 International Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)

    Google Scholar 

  17. Aroca, R.V., Gonçalves, L.M.G.: Towards green data centers: a comparison of x86 and ARM architectures power efficiency. J. Parallel Distrib. Comput. 72(12), 1770–1780 (2012)

    Article  Google Scholar 

  18. Klausecker, C., Kranzlmüller, D., Fürlinger, K.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Fürlinger, K., Klausecker, C., Kranzlmüller, D.: The AppleTV-cluster: towards energy efficient parallel computing on consumer electronic devices. Whitepaper, Ludwig-Maximilians-Universitat (2011)

    Google Scholar 

  20. Rajovic, N., Vilanova, L., Villavieja, C., et al.: The low power architecture approach towards exascale computing. J. Comput. Sci. 4(6), 439–443 (2013)

    Article  Google Scholar 

  21. Dumitrel Loghin, B.M.T., Zhang, H., Ooi, B.C., et al.: A performance study of big data on small nodes. Proc. VLDB Endow. 8(7), 762–773 (2015)

    Article  Google Scholar 

  22. Gu, L., Zeng, D., Guo, S., Yong, X., Hu, J.: A general communication cost optimization framework for big data stream processing in geo-distributed data center. IEEE Trans. Comput. (ToC) (2015)

    Google Scholar 

  23. Lin, G., Zeng, D., Li, P., Guo, S.: Cost minimization for big data processing in geo-distributed data centers. IEEE Trans. Emerg. Topics Comput. 2(3), 314–323 (2014)

    Article  Google Scholar 

  24. Hu, C., Zhao, J., Yan, X., Zeng, D., Guo, S.: A MapReduce based parallel niche genetic algorithm for contaminant source identification in water distribution network. Ad Hoc Netw. 35, 116–126 (2015)

    Article  Google Scholar 

  25. Gu, L., Zeng, D., Guo, S., Barnawi, A., Stojmenovic, I.: Optimal task placement with QoS constraints in geo-distributed data centers using DVFS. IEEE Trans. Comput. (ToC) 64(7), 2049–2059 (2014)

    Article  MathSciNet  Google Scholar 

  26. Plugaru, V., Varrette, S., Pinel, F., et al.: Evaluating the HPC performance and energy-efficiency of Intel and ARM-based systems with synthetic and bioinformatics workloads. In: CSC (2014)

    Google Scholar 

  27. McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns For Efficient Computation. Elsevier, Waltham (2012)

    Google Scholar 

  28. Chou, C.-Y., Chang, Hsi-Ya., Wang, S.-T., Tcheng, S.-C.: Modeling message-passing overhead on NCHC formosa PC cluster. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 299–307. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohu Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Fan, X. et al. (2016). An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation. In: Guo, S., Liao, X., Liu, F., Zhu, Y. (eds) Collaborative Computing: Networking, Applications, and Worksharing. CollaborateCom 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 163. Springer, Cham. https://doi.org/10.1007/978-3-319-28910-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28910-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28909-0

  • Online ISBN: 978-3-319-28910-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics