Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10080))

Included in the following conference series:

  • 1114 Accesses

Abstract

The introduction of web scale operations needed for social media coupled with ease of access to the internet by mobile devices has exponentially increased the amount of data being generated every day. By conservative estimates the world generates close to 50,000 GB of data every second, 90% of which is unstructured, and this growth is accelerating. From its origins as a web log processing system at Yahoo, the open source nature and efficient processing of Apache Hadoop has made it the industry standard for Big Data processing.

TPCx-HS was the first benchmark standard by a major Industry-Standard performance consortium for the Big Data space. TPCx-HS is a derivative of Apache Hadoop Workloads; Teragen, Terasort and Teravalidate. Ever since its release by the TPC in August 2014, all the 18 results published (as of August 2016) have been based on on-premise, Bare-metal hardware configurations.

This paper will show how Hadoop can be deployed on an OpenStack cloud using the OpenStack Sahara project and how TPCx-HS can be used to measure and evaluate the performance of the Cloud under Test (CuT). It will also show how an OpenStack cloud can be optimized to get the performance of TPCx-HS on the Cloud to match as closely as possible that on a Bare-metal configuration. Lastly, it will share results and experiences based on a Hadoop on Cloud Proof-of-Concept (POC), a study that was undertaken by the Dell Open Source Solutions team.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Navint: Why is big data important? (2012). www.navint.com/images/Navint.BigData.FINAL.pdf

  2. TPC: Tpcx-hs (2016). http://www.tpc.org/tpcx-hs/

  3. VMware: Virtualized hadoop performance with vmware vsphere 6 on highperformance servers (2015). http://www.vmware.com/files/pdf/techpaper/Virtualized-Hadoop-Performance-with-VMware-vSphere6.pdf

  4. Stata, R.: Understanding hadoop-as-a-service offerings (2014). http://www.datacenterknowledge.com/archives/2014/05/14/understanding-hadoop-service-offerings/

  5. Hurtgen, A.: Using apache hadoop on rackspace private cloud (2013). https://support.rackspace.com/how-to/apache-hadoop-on-rackspace-private-cloud/

  6. Wendt, M.E.: Cloud-based hadoop deployments: benefits and considerations (2014). https://goo.gl/re0Ov5

  7. OpenStack: Openstack sahara user documentation (2016). http://docs.openstack.org/developer/sahara/userdoc/overview.html

  8. Mirantis: Openstack sahara kilo images (2016). http://sahara-files.mirantis.com/images/upstream/kilo/

  9. Cloudera, I.: Cloudera manager free edition user guide (2012)

    Google Scholar 

  10. TPC: Dell poweredge r720xd with vmware vsphere 6.0 (2015). http://www.tpc.org/5504

  11. OpenStack: Install and configure a storage node - openstack kilo (2015). http://docs.openstack.org/kilo/install-guide/install/yum/content/cinder-install-storage-node.html

  12. RedHat: Cpu pinning and numa topology awareness in openstack compute (2015). http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/

  13. OpenStack: Openstack cinder multi-backend (2015). https://wiki.openstack.org/wiki/Cinder-multi-backend

Download references

Acknowledgments

The authors would like to thank John Terpstra, Michael Pittaro, Randy Perryman, Michael Tondee and David Grimes for participating in the technical review meetings of the POC. Their input, feedback and guidance helped shape this investigation. Mr. Ashok Malani is recognized for his technical leadership of the xFlow Research team that did such a tremendous job performing the tests and drafting this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicholas Wakou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wakou, N., Woodside, M., Kanevsky, A., Khan, F., Arif, M. (2017). TPCx-HS on the Cloud!. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. Traditional - Big Data - Internet of Things. TPCTC 2016. Lecture Notes in Computer Science(), vol 10080. Springer, Cham. https://doi.org/10.1007/978-3-319-54334-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54334-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54333-8

  • Online ISBN: 978-3-319-54334-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics