Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution

Nambiar, Raghunath; Rabl, Tilmann; Kulkarni, Karthik; Frank, Michael

doi:10.1007/978-3-319-31409-9_7

Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution

Raghunath Nambiar¹⁵,
Tilmann Rabl¹⁶,
Karthik Kulkarni¹⁵ &
…
Michael Frank¹⁷

Conference paper

901 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9508))

Abstract

Developed by the Transaction Processing Performance Council, the TPC Express Benchmark™ HS (TPCx-HS) is the industry’s first standard for benchmarking big data systems. It is designed to provide an objective measure of hardware, operating system and commercial Apache Hadoop File System API compatible software distributions, and to provide the industry with verifiable performance, price-performance and availability metrics [1, 2]. It can be used to compare a broad range of system topologies and implementation methodologies of big data systems in a technically rigorous and directly comparable and vendor-neutral manner. The modeled application is simple and the results are highly relevant to hardware and software dealing with Big Data systems in general. The data generation is derived from TeraGen [3] which uses uniform distribution of data. In this paper the authors propose normal distribution (Gaussian distribution) which may be more representative of real life datasets. The modified TeraGen and complete changes required to the TPCx-HS kit are included as part of this paper.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Cf., http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm.

References

Nambiar, R., Poess, M., Dey, A., Cao, P., Magdon-Ismail, T., Qi Ren, D., Bond, A.: Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems. In: Nambiar, R., Poess, M. (eds.) TPCTC 2014. LNCS, vol. 8904, pp. 1–12. Springer, Heidelberg (2015)
Chapter Google Scholar
TPCx-HS Specification. www.tpc.org
O’Malley, O.: TeraByte sort on apache hadoop (2008)
Google Scholar
Nambiar, R., Poess, M.: Keeping the TPC relevant! PVLDB 6(11), 1186–1187 (2013)
Google Scholar
Nambiar, Raghunath, Poess, Meikel (eds.): TPCTC 2013. LNCS, vol. 8391. Springer, Heidelberg (2014)
Google Scholar
Nambiar, R.: A standard for benchmarking big data systems. In: BigData Conference 2014, pp. 18–20 (2014)
Google Scholar
Nambiar, R.: Benchmarking big data systems: introducing TPC express benchmark HS. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, H.-A. (eds.) WBDB 2014. LNCS, vol. 8991, pp. 24–28. Springer, Heidelberg (2015)
Chapter Google Scholar

Download references

Acknowledgements

The authors thank the contributors of the original TPCx-HS development committee, Andrew Bond (Red Hat), Andrew Masland (NEC), Avik Dey (Intel), Brian Caufield (IBM), Chaitanya Baru (SDSC), Da Qi Ren (Huawei), Dileep Kumar (Cloudera), Jamie Reding (Microsoft), John Fowler (Oracle), John Poelman (IBM), Karthik Kulkarni (Cisco), Meikel Poess (Oracle), Mike Brey (Oracle), Mike Crocker (SAP), Paul Cao (HP), Reza Taheri (VMware), Simon Harris (IBM), Tariq Magdon-Ismail (VMware), Wayne Smith (Intel), Yanpei Chen (Cloudera), Michael Majdalany (L&M), Forrest Carman (Owen Media), and Andreas Hotea (Hotea Solutions). Thanks to Manankumar Trivedi for his support with benchmark testing and analysis.

Authors also thank Satinder Sethi for his guidance and support with this effort.

Author information

Authors and Affiliations

Cisco Systems, Inc., 275 East Tasman Drive, San Jose, CA, 95134, USA
Raghunath Nambiar & Karthik Kulkarni
University of Toronto, 27 King’s College Circle, Toronto, ON, M5S, Canada
Tilmann Rabl
Bankmark, Bahnhofstrasse 10, 94032, Passau, Germany
Michael Frank

Authors

Raghunath Nambiar
View author publications
You can also search for this author in PubMed Google Scholar
Tilmann Rabl
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Michael Frank
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghunath Nambiar .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, CA, USA
Raghunath Nambiar
Oracle Corporation, Redwood City, CA, USA
Meikel Poess

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nambiar, R., Rabl, T., Kulkarni, K., Frank, M. (2016). Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things. TPCTC 2015. Lecture Notes in Computer Science(), vol 9508. Springer, Cham. https://doi.org/10.1007/978-3-319-31409-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-31409-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31408-2
Online ISBN: 978-3-319-31409-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation