Skip to main content

TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds

  • Conference paper
Networked Digital Technologies (NDT 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 293))

Included in the following conference series:

Abstract

NoSQL systems rose alongside internet companies, which have different challenges in dealing with data that the traditional RDBMS solutions could not cope with. Indeed, in order to handle the continuous growth of data, NoSQL alternatives feature dynamic horizontal scaling rather than vertical scaling. To date few studies address OLAP benchmarking of NoSQL systems. This paper overviews NoSQL and adjacent technologies, and evaluates Hadoop/Pig using TPC-H benchmark, through two different scenarios of clouds. The first scenario assumes that data is saved on a data cloud and business questions are routed to the cloud for processing; while the second scenario assumes pre-summarized data calculus in a first step and multidimensional analysis in a second step. Finally, the paper reports thorough performance tests on Hadoop for various data volumes, workloads, and cluster’ sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Laskowski, N.: Gartner: Business intelligence software market continues to grow (2011), http://www.gartner.com/it/page.jsp?id=1553215

  2. Moussa, R.: Revolving TPC-H benchmark into a Multidimensional Benchmark. Res. Rep. (2012)

    Google Scholar 

  3. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150 (2004)

    Google Scholar 

  4. Hadoop Homepage, http://hadoop.apache.org/

  5. Chuck, L.: Hadoop in Action. Manning (2010)

    Google Scholar 

  6. Information Week (September 20, 2010), http://www.informationweek.com/news/software/info_management/227500077?subSection=All+Stories

  7. Kim, K., Jeon, K., Han, H., Kim, S.G., Jung, H., Yeom, H.Y.: MRBench: A Benchmark for MapReduce Framework. In: Proceedings of ICPADS, pp. 11–18 (2008)

    Google Scholar 

  8. Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: Ubersetzung von SPARQL nach Pig Latin. In: Proc. BTW, pp. 65–84 (2011)

    Google Scholar 

  9. Loebman, S., Nunley, D., Kwon, Y., Howe, B., Balazinska, M., Gardner, J.P.: Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? In: Proceedings of CLUSTER, pp. 1–10 (2009)

    Google Scholar 

  10. Iu, M., Zwaenepoel, W.: HadoopToSQL: a mapReduce query optimizer. In: Proceedings of EuroSys, pp. 251–264 (2010)

    Google Scholar 

  11. Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.: YSmart: Yet Another SQL-to-Map Reduce Translator. In: ICDCS, pp. 25–36 (2011)

    Google Scholar 

  12. Jia, Y.: Running the TPC-H Benchmark on Hive (2009), https://issues.apache.org/jira/secure/attachment/12416257/TPC-H_on_Hive_2009-08-11.pdf

  13. Li, J., Koichi, I., Muzhi, Z., Diestelkaemper, R., Wang, X., Lin, Y.: Running Pig on TPC-H. Res. Rep. (December 2011), https://issues.apache.org/jira/browse/PIG-2397

  14. Moussa, R.: TPC-H Benchmarking of Pig on a Hadoop Cluster. In: Proceedings of 2nd Intl. Conference on Communications & Information Technology (2012)

    Google Scholar 

  15. TPC-H Homepage, http://www.tpc.org/tpch/

  16. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of SIGMOD Conference, pp. 1099–1110 (2008)

    Google Scholar 

  17. Apache Pig Homepage, http://pig.apache.org/

  18. Gates, A.: Programming Pig. O’Reilly (2011)

    Google Scholar 

  19. Piggy Bank Homepage, http://wiki.apache.org/pig/PiggyBank

  20. Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP to User-Analysis: an IT Mandate (white paper), www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf

  21. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record, 65–74 (1997)

    Google Scholar 

  22. Usman, M., Asghar, S., Fong, S.: Hierarchical Clustering Model for Automatic OLAP Schema Generation. Journal of E-Technology 2(1), 9–20 (2011)

    Google Scholar 

  23. Bornaz, L., Victor, V.: Optimized OLAP Systems Integrated Data Indexing Algorithms. Journal of Information Technology Review 1(3), 145–150 (2010)

    Google Scholar 

  24. Hang, Y., Fong, S.: Algorithmic level stream mining for Business Intelligence System Architecture building. International Journal of Web Applications 3(1), 29–35 (2011)

    Google Scholar 

  25. Bourennani, F., Alsadi, J., Rizvi, G.M., Ross, D.: Manufacturing Processing Improvements Using Business Intelligence. Journal of Information Technology Review 2(3), 125–131 (2011)

    Google Scholar 

  26. Yousef, R., Odeh, M., Coward, D., Sharieh, A.: Translating RAD Business Process Models into BPMN Models: A Semi-Formal Approach. International Journal of Web Applications 3(4), 187–196 (2011)

    Google Scholar 

  27. Chaâbane, M.A., Bouzgu, L.: VerFlexFlow and Querying Language for Business Process Model. Journal of E-Technology 2, 69–78 (2011)

    Google Scholar 

  28. Zhang, J., Wang, X., Liu, H., Meng, J.: A Novel Access Control Strategy for Distributed Data Systems. Journal of Digital Information Management 8(5), 291–297 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moussa, R. (2012). TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30507-8_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30506-1

  • Online ISBN: 978-3-642-30507-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics