Abstract
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware resources allocated. Most of the time, the resources are values selected/given by the Cloud service providers. However, setting the right virtual resources in accordance with the workload demands of a query, such as the number of CPUs, the size of RAM, and the network bandwidth, will improve the response time when querying large data on an optimized system. In this study, we carried out a set of experiments with a well-known Mapreduce SQL-translator, Hadoop Hive, on benchmark decision support the TPC benchmark (TPC-H) database in order to analyze the performance sensitivity of the queries under different virtual resource settings. Our results provide valuable hints for the decision makers who design efficient MapReduce-based data warehouses on the Cloud.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amazon Web Services (AWS). aws.amazon.com (last accessed September 5, 2014)
Google App Engine. http://code.google.com/appengine/ (last accessed September 5, 2014)
Windows Azure Platform. microsoft.com/windowsazure/ (last accessed September 5)
Apache Hadoop. http://hadoop.apache.org/ (last accessed May 1, 2015)
Kantere, V., Dash, D., Francois, G., Kyriakopoulou, S., Ailamaki, A.: Optimal service pricing for a cloud cache. IEEE Transactions on Knowledge and Data Engineering 23(9), 1345–1358 (2011)
Kllapi, H., Sitaridi, E., Tsangaris, M.M., Ioannidis, Y.E.: Schedule optimization for data processing ows on the cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2011)
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Warfield, A.: Xen and the art of virtualization. ACM SIGOPS Operating Systems Review 37(5), 164–177 (2003)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis, P., Kamath, S.: Automatic virtual machine configuration for database workloads. ACM Transactions on Database Systems (TODS) 35(1), 7 (2010)
Aboulnaga, A., Amza, C., Salem, K.: Virtualization and databases: state of the art and research challenges. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, pp. 746–747 (2008)
Dokeroglu, T., Ozal, S., Bayir, M.A., Cinar, M.S., Cosar, A.: Improving the performance of Hadoop Hive by sharing scan and computation tasks. Journal of Cloud Computing 3(1), 1–11 (2014)
Dokeroglu, T., Sert, S.A., Cinar, M.S.: Evolutionary multiobjective query workload optimization of Cloud data warehouses. The Scientific World Journal (2014)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: Proc. of the 7th USENIX Conf. on Networked Systems Design and Implementation (2010)
Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes. Communications of the ACM 53(1), 64–71 (2010)
Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Sidell, J.: Mariposa: a wide-area distributed database system. The VLDB Journal 5(1), 48–63 (1996)
Marbukh, V., Mills, K.: Demand pricing and resource allocation in market-based compute grids: a model and initial results. In: ICN 2008, pp. 752–757 (2008)
Moreno, R., Alonso-Conde, A.B.: Job scheduling and resource management techniques in economic grid environments. In: Fernández Rivera, F., Bubak, M., Gómez Tato, A., Doallo, R. (eds.) Across Grids 2003. LNCS, vol. 2970, pp. 25–32. Springer, Heidelberg (2004)
Berriman, G.B., Juve, G., Deelman, E., Regelson, M., Plavchan, P.: The application of cloud computing to astronomy: a study of cost and performance. In: Sixth IEEE International Conference e-Science Workshops, pp. 1–7 (2010)
Tsakalozos, K., Kllapi, H., Sitaridi, E., Roussopoulos, M., Paparas, D., Delis, A.: Flexible use of cloud resources through profit maximization and price discrimination. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 75–86 (2011)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. of the VLDB 2(1), 922–933 (2009)
Weikum, G., Moenkeberg, A., Hasse, C., Zabback, P.: Self-tuning database technology and information services: from wishful thinking to viable engineering. In: Proceedings of VLDB, pp. 20–31 (2002)
Agrawal, S., Chaudhuri, S., Das, A., Narasayya, V.: Automating layout of relational databases. In: ICDE, pp. 607–618 (2003)
Dash, D., Kantere, V., Ailamaki, A.: An economic model for self-tuned cloud caching. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1687–1693 (2009)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 50 (2008)
Hadoop Hive project. http://hadoop.apache.org/hive/ (last accessed May 1, 2015)
Dai, W., Bassiouni, M.: An improved task assignment scheme for Hadoop running in the clouds. Journal of Cloud Computing: Advances, Systems and Applications 2(1), 1–16 (2013)
Chatziantoniou, D., Tzortzakakis, E.: Asset queries: a declarative alternative to mapreduce. ACM SIGMOD Record 38(2), 35–41 (2009)
Mahboubi, H., Darmont, J.: Enhancing XML data warehouse query performance by fragmentation. In: Proceedings of ACM Symposium on Applied Computing, pp. 1555–1562 (2009)
Ordonez, C., Song, I.Y., Garcia-Alvarado, C.: Relational versus non-relational database systems for data warehousing. In: Proc. of the ACM 13th Int. Workshop on Data warehousing and OLAP, pp. 67–68 (2010)
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Zaharia, M.: A view of cloud computing. Communications of the ACM 53(4), 50–58 (2010)
Zhou, J., Larson, P.A., Elmongui, H.G.: Lazy maintenance of materialized views. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 231–242 (2007)
Storm, A.J., Garcia-Arellano, C., Lightstone, S.S., Diao, Y., Surendra, M.: Adaptive self-tuning memory in DB2. In: Proceedings of VLDB, pp. 1081–1092 (2006)
Running TPC-H queries on Hive. http://issues.apache.org/jira/browse/HIVE-600 (last accessed May 1, 2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Dokeroglu, T., Cınar, M.S., Sert, S.A., Cosar, A., Yazıcı, A. (2016). Improving Hadoop Hive Query Response Times Through Efficient Virtual Resource Allocation. In: Andreasen, T., et al. Flexible Query Answering Systems 2015. Advances in Intelligent Systems and Computing, vol 400. Springer, Cham. https://doi.org/10.1007/978-3-319-26154-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-26154-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26153-9
Online ISBN: 978-3-319-26154-6
eBook Packages: Computer ScienceComputer Science (R0)