Skip to main content

Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution

  • Conference paper
  • 901 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9508))

Abstract

Developed by the Transaction Processing Performance Council, the TPC Express Benchmark™ HS (TPCx-HS) is the industry’s first standard for benchmarking big data systems. It is designed to provide an objective measure of hardware, operating system and commercial Apache Hadoop File System API compatible software distributions, and to provide the industry with verifiable performance, price-performance and availability metrics [1, 2]. It can be used to compare a broad range of system topologies and implementation methodologies of big data systems in a technically rigorous and directly comparable and vendor-neutral manner. The modeled application is simple and the results are highly relevant to hardware and software dealing with Big Data systems in general. The data generation is derived from TeraGen [3] which uses uniform distribution of data. In this paper the authors propose normal distribution (Gaussian distribution) which may be more representative of real life datasets. The modified TeraGen and complete changes required to the TPCx-HS kit are included as part of this paper.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Cf., http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm.

References

  1. Nambiar, R., Poess, M., Dey, A., Cao, P., Magdon-Ismail, T., Qi Ren, D., Bond, A.: Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems. In: Nambiar, R., Poess, M. (eds.) TPCTC 2014. LNCS, vol. 8904, pp. 1–12. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  2. TPCx-HS Specification. www.tpc.org

  3. O’Malley, O.: TeraByte sort on apache hadoop (2008)

    Google Scholar 

  4. Nambiar, R., Poess, M.: Keeping the TPC relevant! PVLDB 6(11), 1186–1187 (2013)

    Google Scholar 

  5. Nambiar, Raghunath, Poess, Meikel (eds.): TPCTC 2013. LNCS, vol. 8391. Springer, Heidelberg (2014)

    Google Scholar 

  6. Nambiar, R.: A standard for benchmarking big data systems. In: BigData Conference 2014, pp. 18–20 (2014)

    Google Scholar 

  7. Nambiar, R.: Benchmarking big data systems: introducing TPC express benchmark HS. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, H.-A. (eds.) WBDB 2014. LNCS, vol. 8991, pp. 24–28. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors thank the contributors of the original TPCx-HS development committee, Andrew Bond (Red Hat), Andrew Masland (NEC), Avik Dey (Intel), Brian Caufield (IBM), Chaitanya Baru (SDSC), Da Qi Ren (Huawei), Dileep Kumar (Cloudera), Jamie Reding (Microsoft), John Fowler (Oracle), John Poelman (IBM), Karthik Kulkarni (Cisco), Meikel Poess (Oracle), Mike Brey (Oracle), Mike Crocker (SAP), Paul Cao (HP), Reza Taheri (VMware), Simon Harris (IBM), Tariq Magdon-Ismail (VMware), Wayne Smith (Intel), Yanpei Chen (Cloudera), Michael Majdalany (L&M), Forrest Carman (Owen Media), and Andreas Hotea (Hotea Solutions). Thanks to Manankumar Trivedi for his support with benchmark testing and analysis.

Authors also thank Satinder Sethi for his guidance and support with this effort.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghunath Nambiar .

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nambiar, R., Rabl, T., Kulkarni, K., Frank, M. (2016). Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things. TPCTC 2015. Lecture Notes in Computer Science(), vol 9508. Springer, Cham. https://doi.org/10.1007/978-3-319-31409-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31409-9_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31408-2

  • Online ISBN: 978-3-319-31409-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics