Abstract
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system’s cost-performance (ALOJA’s Web application, tools, and sources available at http://aloja.bsc.es). This article describes the evolution of the project’s focus and research lines from over a year of continuously benchmarking Hadoop under different configuration and deployments options, presents results, and discusses the motivation both technical and market-based of such changes. During this time, ALOJA’s target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configurations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Storage PCIe NAND flash, SSD drives, and InfiniBand networking.
- 2.
Implementing dynamic code interposition is planned i.e., Aspect Oriented Programming.
References
Borthakur, D.: System, the Hadoop distributed file: architecture and design. The Apache Software Foundation (2007). http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
BSC. Aloja home page (2015). http://aloja.bsc.es/
BSC. Performance tools research group page (2015). http://www.bsc.es/computer-sciences/performance-tools
BSC. Administrator privileges on headnode of hdinsight-cluster, May 2015. http://www.postseek.com/meta/bd1cddf3af9c7ce35d147e842a686410
Gartner. Predictive analytics, May 2015. http://www.gartner.com/it-glossary/predictive-analytics
Guitart, J., Torres, J., Ayguad, E., Oliver, J., Labarta, J.: Java instrumentation suite: accurate analysis of java threaded applications. In: Proceedings of the Second Annual Workshop on Java for HPC, ICS 2000, pp. 15–25 (2000)
Heger, D.: Hadoop performance tuning - a pragmatic & iterative approach. DH Technologies (2013)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 22nd International Conference on Data Engineering Workshops, pp. 41–51 (2010)
Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing, HotCloud 2009, Berkeley, CA, USA. USENIX Association (2009)
Person, L.: Global hadoop market. Allied market research, March 2014
Poggi, N., Carrera, D., Call, A., Mendoza, S., Becerra, Y., Torres, J., Ayguadé, E., Gagliardi, F., Labarta, J., Reinauer, R., Vujic, N., Green, D., Blakeley, J.: ALOJA: a systematic study of hadoop deployment variables to enable automated characterization of cost-effectiveness. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, 27–30 October 2014, pp. 905–913 (2014)
Schwartz, B., Zaitsev, P., Tkachenko, V.: High Performance MySQL. O’Reilly Media, Sebastopol (2012)
Wikipedia. Predictive analytics, May 2015. http://en.wikipedia.org/wiki/predictive_analytics
Zhang, Z., Cherkasova, L., Loo, B.T.: Optimizing cost and performance trade-offs for mapreduce job processing in the cloud. In: 2014 IEEE on Network Operations and Management Symposium (NOMS), pp. 1–8. IEEE (2014)
Apache Foundation. Apache Hadoop. http://hadoop.apache.org. Accessed Apr. 2015
Berral, J.Ll.: Improved management of data-center systems using machine learning. Ph.D. thesis on Computer Science, November 2013
Heger, D.: Hadoop performance tuning. https://hadoop-toolkit.googlecode.com/files/Whitepaper-HadoopPerformanceTuning.pdf. Accessed Jan. 2015
Intel Corporation. Intel HiBench, Hadoop benchmark suite. https://github.com/intel-hadoop/HiBench. Accessed Apr. 2015
Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, Singapore, pp. 343–348 (1992)
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Poster Papers of the 9th European Conference on Machine Learning (1997)
Acknowledgements
This work is partially supported the BSC-Microsoft Research Centre, the Spanish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Poggi, N., Berral, J.L., Carrera, D. (2016). ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-49748-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)