Skip to main content

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Abstract

Hadoop Remote Procedure Call (RPC) is increasingly being used with other data-center middlewares such as MapReduce, HDFS, and HBase in many data-centers (e.g. Facebook, Yahoo!) because of its simplicity, productivity, and high performance. For RPC systems, achieving low latency and high throughput is critical. However, a standardized benchmark suite that focuses on helping users evaluate the performance of standalone Hadoop RPC is lacking in current Apache Hadoop distribution. In this paper, we design and develop a micro-benchmark suite that can be used to evaluate the performance of Hadoop RPC in terms of latency and throughput with different data types. We show how this benchmark suite can be used to evaluate the performance of Hadoop RPC over different networks/protocols and parameter configurations on modern clusters.

This research is supported in part by National Science Foundation grants #OCI-0926691, #OCI-1148371 and #CCF-1213084.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. BigDataBench : A Big Data Benchmark Suite. http://prof.ict.ac.cn/BigDataBench

  2. Protocol buffers. http://code.google.com/p/protobuf/

  3. RDMA for Apache Hadoop. http://hadoop-rdma.cse.ohio-state.edu

  4. Remote Rrocedure Call. http://en.wikipedia.org/wiki/Remote_procedure_call

  5. TPC Benchmark H - Standard Speci cation. http://www.tpc.org/tpch

  6. Apache HBase: http://hbase.apache.org/

  7. Apache Thrift: http://thrift.apache.org/

  8. Bennett, C., Grossman, R.L., Locke, D., Seidman, J., Vejcik, S.: Malstone: towards a benchmark for analytics on large data clouds. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, New York, NY, USA (2010)

    Google Scholar 

  9. Birrel, A.D., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput, Syst. 2(1), 39–59 (1984)

    Article  Google Scholar 

  10. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC, New York, NY, USA (2010)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation, OSDI, Berkeley, CA, USA (2004)

    Google Scholar 

  12. Ghazaleh, N.A., Lewis, M.J.: Differential deserialization for optimized SOAP performance. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC, Washington, DC, USA (2005)

    Google Scholar 

  13. Head, M.R., Govindaraju, M., Slominski, A., Liu, P., Abu-Ghazaleh, N., van Engelen, R., Chiu, K., Lewis, M.J.: A benchmark suite for SOAP-based communication in grid Web services. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC, Washington, DC, USA (2005)

    Google Scholar 

  14. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 26th International Conference on Data Engineering Workshops, ICDEW, Long Beach, CA, USA (2010)

    Google Scholar 

  15. Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Jose, J., Panda, D.K.D.K.: A micro-benchmark suite for evaluating HDFS operations on modern clusters. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 129–147. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Islam, N.S., Rahman, M.W., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High performance RDMA-based design of HDFS over InfiniBand. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2012)

    Google Scholar 

  17. Kim, K., Jeon, K., Han, H., gyu Kim, S., Jung, H., Yeom, H.: MRBench: a benchmark for MapReduce framework. In: Proceedings of the IEEE 14th International Conference on Parallel and Distributed Systems, ICPADS, Melbourne, Victoria, Australia (2008)

    Google Scholar 

  18. Liang, F., Feng, C., Lu, X., Xu, Z.: Performance benefits of DataMPI: a case study with BigDataBench. In: The 4th Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, BPOE-4, Salt lake city, Utah (2014)

    Google Scholar 

  19. Lu, X., Islam, N.S., Rahman, M.W., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: Proceedings of the IEEE 42th International Conference on Parallel Processing, ICPP, Lyon, France (2013)

    Google Scholar 

  20. Lu, X., Lin, J., Zou, Y., Peng, J., Liu, X., Zha, L.: Investigating, modeling, and ranking interface complexity of Web services on the World Wide Web. In: Proceedings of the 6th World Congress on Services, SERVICES-1, Miami, Florida (2010)

    Google Scholar 

  21. Lu, X., Zou, Y., Xiong, F., Lin, J., Zha, L.: ICOMC: invocation complexity of multi-language clients for classified Web services and its impact on large scale SOA applications. In: Proceedings of the International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT, Hiroshima, Japan (2009)

    Google Scholar 

  22. Patil, S., Polte, M., Ren, K., Tantisiriroj, W., Xiao, L., López, J., Gibson, G., Fuchs, A., Rinaldi, B.: YCSB++: benchmarking and performance debugging advanced features in scalable table stores. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SoCC, New York, NY, USA(2011)

    Google Scholar 

  23. Rahman, M.W., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-Performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW, Washington, DC, USA (2013)

    Google Scholar 

  24. Sangroya, A., Serrano, D., Bouchenak, S.: MRBS: towards dependability benchmarking for Hadoop MapReduce. In: Proceedings of the 18th International Conference on Parallel Processing Workshops, Euro-Par, Aachen, Germany (2013)

    Google Scholar 

  25. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST, Incline Village, Nevada (2010)

    Google Scholar 

  26. The Apache Software Foundation: Apache Avro. http://avro.apache.org/

  27. The Apache Software Foundation: Apache Hadoop. http://hadoop.apache.org

  28. Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a big data benchmark suite from Internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, HPCA, Orlando, Florida (2014)

    Google Scholar 

  29. White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyi Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lu, X., Wasi-ur-Rahman, M., Islam, N.S., Panda, D.K.(. (2014). A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10596-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10595-6

  • Online ISBN: 978-3-319-10596-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics