Skip to main content

ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis

  • Conference paper
  • First Online:
Big Data Benchmarking (WBDB 2015, WBDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10044))

Included in the following conference series:

  • 850 Accesses

Abstract

The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system’s cost-performance (ALOJA’s Web application, tools, and sources available at http://aloja.bsc.es). This article describes the evolution of the project’s focus and research lines from over a year of continuously benchmarking Hadoop under different configuration and deployments options, presents results, and discusses the motivation both technical and market-based of such changes. During this time, ALOJA’s target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configurations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Storage PCIe NAND flash, SSD drives, and InfiniBand networking.

  2. 2.

    Implementing dynamic code interposition is planned i.e., Aspect Oriented Programming.

References

  1. Borthakur, D.: System, the Hadoop distributed file: architecture and design. The Apache Software Foundation (2007). http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf

  2. BSC. Aloja home page (2015). http://aloja.bsc.es/

  3. BSC. Performance tools research group page (2015). http://www.bsc.es/computer-sciences/performance-tools

  4. BSC. Administrator privileges on headnode of hdinsight-cluster, May 2015. http://www.postseek.com/meta/bd1cddf3af9c7ce35d147e842a686410

  5. Gartner. Predictive analytics, May 2015. http://www.gartner.com/it-glossary/predictive-analytics

  6. Guitart, J., Torres, J., Ayguad, E., Oliver, J., Labarta, J.: Java instrumentation suite: accurate analysis of java threaded applications. In: Proceedings of the Second Annual Workshop on Java for HPC, ICS 2000, pp. 15–25 (2000)

    Google Scholar 

  7. Heger, D.: Hadoop performance tuning - a pragmatic & iterative approach. DH Technologies (2013)

    Google Scholar 

  8. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 22nd International Conference on Data Engineering Workshops, pp. 41–51 (2010)

    Google Scholar 

  9. Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing, HotCloud 2009, Berkeley, CA, USA. USENIX Association (2009)

    Google Scholar 

  10. Person, L.: Global hadoop market. Allied market research, March 2014

    Google Scholar 

  11. Poggi, N., Carrera, D., Call, A., Mendoza, S., Becerra, Y., Torres, J., Ayguadé, E., Gagliardi, F., Labarta, J., Reinauer, R., Vujic, N., Green, D., Blakeley, J.: ALOJA: a systematic study of hadoop deployment variables to enable automated characterization of cost-effectiveness. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, 27–30 October 2014, pp. 905–913 (2014)

    Google Scholar 

  12. Schwartz, B., Zaitsev, P., Tkachenko, V.: High Performance MySQL. O’Reilly Media, Sebastopol (2012)

    Google Scholar 

  13. Wikipedia. Predictive analytics, May 2015. http://en.wikipedia.org/wiki/predictive_analytics

  14. Zhang, Z., Cherkasova, L., Loo, B.T.: Optimizing cost and performance trade-offs for mapreduce job processing in the cloud. In: 2014 IEEE on Network Operations and Management Symposium (NOMS), pp. 1–8. IEEE (2014)

    Google Scholar 

  15. Apache Foundation. Apache Hadoop. http://hadoop.apache.org. Accessed Apr. 2015

  16. Berral, J.Ll.: Improved management of data-center systems using machine learning. Ph.D. thesis on Computer Science, November 2013

    Google Scholar 

  17. Heger, D.: Hadoop performance tuning. https://hadoop-toolkit.googlecode.com/files/Whitepaper-HadoopPerformanceTuning.pdf. Accessed Jan. 2015

  18. Intel Corporation. Intel HiBench, Hadoop benchmark suite. https://github.com/intel-hadoop/HiBench. Accessed Apr. 2015

  19. Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, Singapore, pp. 343–348 (1992)

    Google Scholar 

  20. Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Poster Papers of the 9th European Conference on Machine Learning (1997)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported the BSC-Microsoft Research Centre, the Spanish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Poggi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Poggi, N., Berral, J.L., Carrera, D. (2016). ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49748-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49747-1

  • Online ISBN: 978-3-319-49748-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics