Abstract
NoSQL systems rose alongside internet companies, which have different challenges in dealing with data that the traditional RDBMS solutions could not cope with. Indeed, in order to handle the continuous growth of data, NoSQL alternatives feature dynamic horizontal scaling rather than vertical scaling. To date few studies address OLAP benchmarking of NoSQL systems. This paper overviews NoSQL and adjacent technologies, and evaluates Hadoop/Pig using TPC-H benchmark, through two different scenarios of clouds. The first scenario assumes that data is saved on a data cloud and business questions are routed to the cloud for processing; while the second scenario assumes pre-summarized data calculus in a first step and multidimensional analysis in a second step. Finally, the paper reports thorough performance tests on Hadoop for various data volumes, workloads, and cluster’ sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Laskowski, N.: Gartner: Business intelligence software market continues to grow (2011), http://www.gartner.com/it/page.jsp?id=1553215
Moussa, R.: Revolving TPC-H benchmark into a Multidimensional Benchmark. Res. Rep. (2012)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150 (2004)
Hadoop Homepage, http://hadoop.apache.org/
Chuck, L.: Hadoop in Action. Manning (2010)
Information Week (September 20, 2010), http://www.informationweek.com/news/software/info_management/227500077?subSection=All+Stories
Kim, K., Jeon, K., Han, H., Kim, S.G., Jung, H., Yeom, H.Y.: MRBench: A Benchmark for MapReduce Framework. In: Proceedings of ICPADS, pp. 11–18 (2008)
Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: Ubersetzung von SPARQL nach Pig Latin. In: Proc. BTW, pp. 65–84 (2011)
Loebman, S., Nunley, D., Kwon, Y., Howe, B., Balazinska, M., Gardner, J.P.: Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? In: Proceedings of CLUSTER, pp. 1–10 (2009)
Iu, M., Zwaenepoel, W.: HadoopToSQL: a mapReduce query optimizer. In: Proceedings of EuroSys, pp. 251–264 (2010)
Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.: YSmart: Yet Another SQL-to-Map Reduce Translator. In: ICDCS, pp. 25–36 (2011)
Jia, Y.: Running the TPC-H Benchmark on Hive (2009), https://issues.apache.org/jira/secure/attachment/12416257/TPC-H_on_Hive_2009-08-11.pdf
Li, J., Koichi, I., Muzhi, Z., Diestelkaemper, R., Wang, X., Lin, Y.: Running Pig on TPC-H. Res. Rep. (December 2011), https://issues.apache.org/jira/browse/PIG-2397
Moussa, R.: TPC-H Benchmarking of Pig on a Hadoop Cluster. In: Proceedings of 2nd Intl. Conference on Communications & Information Technology (2012)
TPC-H Homepage, http://www.tpc.org/tpch/
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of SIGMOD Conference, pp. 1099–1110 (2008)
Apache Pig Homepage, http://pig.apache.org/
Gates, A.: Programming Pig. O’Reilly (2011)
Piggy Bank Homepage, http://wiki.apache.org/pig/PiggyBank
Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP to User-Analysis: an IT Mandate (white paper), www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record, 65–74 (1997)
Usman, M., Asghar, S., Fong, S.: Hierarchical Clustering Model for Automatic OLAP Schema Generation. Journal of E-Technology 2(1), 9–20 (2011)
Bornaz, L., Victor, V.: Optimized OLAP Systems Integrated Data Indexing Algorithms. Journal of Information Technology Review 1(3), 145–150 (2010)
Hang, Y., Fong, S.: Algorithmic level stream mining for Business Intelligence System Architecture building. International Journal of Web Applications 3(1), 29–35 (2011)
Bourennani, F., Alsadi, J., Rizvi, G.M., Ross, D.: Manufacturing Processing Improvements Using Business Intelligence. Journal of Information Technology Review 2(3), 125–131 (2011)
Yousef, R., Odeh, M., Coward, D., Sharieh, A.: Translating RAD Business Process Models into BPMN Models: A Semi-Formal Approach. International Journal of Web Applications 3(4), 187–196 (2011)
Chaâbane, M.A., Bouzgu, L.: VerFlexFlow and Querying Language for Business Process Model. Journal of E-Technology 2, 69–78 (2011)
Zhang, J., Wang, X., Liu, H., Meng, J.: A Novel Access Control Strategy for Distributed Data Systems. Journal of Digital Information Management 8(5), 291–297 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moussa, R. (2012). TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-30507-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30506-1
Online ISBN: 978-3-642-30507-8
eBook Packages: Computer ScienceComputer Science (R0)