skip to main content
10.1145/2814576.2814735acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

vRead: Efficient Data Access for Hadoop in Virtualized Clouds

Published: 24 November 2015 Publication History

Abstract

With its unlimited scalability and on-demand access to computation and storage, a virtualized cloud platform is the perfect match for big data systems such as Hadoop. However, virtualization introduces a significant amount of overhead to I/O intensive applications due to device virtualization and VMs or I/O threads scheduling delay. In particular, device virtualization causes significant CPU overhead as I/O data needs to be moved across several protection boundaries. We observe that such overhead especially affects the I/O performance of the Hadoop distributed file system (HDFS). In fact, data read from an HDFS datanode VM must go through virtual devices multiple times --- incurring non-negligible virtualization overhead --- even though both client VM and datanode VM may be running on the same machine. In this paper, we propose vRead, a programmable framework which connects I/O flows from HDFS applications directly to their data. vRead enables direct "reads" to the disk images of datanode VMs from the hypervisor. By doing so, vRead can significantly avoid device virtualization overhead, resulting in improved I/O throughput as well as CPU savings for Hadoop workloads and other applications relying on HDFS.

References

[1]
Amazon EC2. http://aws.amazon.com/ec2.
[2]
Apache HBase. http://hbase.apache.org/.
[3]
Apache Hive. https://hive.apache.org/.
[4]
Apache Sqoop. http://sqoop.apache.org/.
[5]
CPU Frequency Utils. http://mirrors.dotsrc.org/linux/utils/kernel/cpufreq/cpufrequtils.html.
[6]
Docker. http://www.docker.com/.
[7]
Elastic Map/Reduce (EMR). http://aws.amazon.com/elasticmapreduce/.
[8]
HDFS Short-Circuit Local Reads. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html.
[9]
lookbusy -- load generator. http://www.devin.com/lookbusy/.
[10]
Netperf Benchmark. http://www.netperf.org/.
[11]
Sahara. https://wiki.openstack.org/wiki/Sahara.
[12]
Hadoop Virtualization Extensions on VMware vSphere5. In VMware technical white paper (2012).
[13]
A Benchmarking Case study of Virtualized Hadoop Performance on VMware vSphere5. In VMware technical white paper (2013).
[14]
Ahmad, I., Gulati, A., and Mashtizadeh, A. vIC: Interrupt coalescing for virtual machine storage device IO. In USENIX ATC (2011).
[15]
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In ACM SOSP (2003).
[16]
Borthakur, D. The hadoop distributed file system: Architecture and design. In Hadoop Project Website (2007), vol. 11, p. 21.
[17]
Clark, Christopher, Keir Fraser, S. H., Jacob Gorm Hansen, E. J., Christian Limpach, I. P., and Warfield., A. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design and Implementation (2005), vol. 2, pp. 273--286.
[18]
Dong, Y., Yu, Z., and Rose, G. SR-IOV networking in Xen: architecture, design and implementation. In WIOV (2008).
[19]
Gamage, S., Cong, X., Kompella, R. R., and Xu, D. vPipe: piped i/o offloading for efficient data movement in virtualized clouds. In ACM SOCC (2014).
[20]
Gamage, S., Kangarlou, A., Kompella, R. R., and Xu, D. Opportunistic flooding to improve TCP transmit performance in virtualized clouds. In ACM SOCC (2011).
[21]
Ghemawat, S., Gobioff, H., and Leung, S.-T. The google file system. In ACM SIGOPS operating systems review (2003), vol. 37, ACM, pp. 29--43.
[22]
Gordon, A., Amit, N., Har'El, N., Ben-Yehuda, M., Landau, A., Schuster, A., and Tsafrir, D. ELI: bare-metal performance for I/O virtualization. In ACM ASPLOS (2012).
[23]
Gordon, A., Ben-Yehuda, M., Filimonov, D., and Dahan, M. VAMOS: virtualization aware middleware. In WIOV (2011).
[24]
Har'El, N., Gordon, A., Landau, A., Ben-Yehuda, M., Traeger, A., and Ladelsky, R. Efficient and scalable paravirtual I/O system. In USENIX ATC (2013).
[25]
Hiremane, R. Intel virtualization technology for directed I/O (Intel VT-d). Technology@ Intel Magazine 4, 10 (2007).
[26]
Jujjuri, V., Hensbergen, E. V., and Liguori, A. VirtFS -- a virtualization aware file system pass-through. In OLS (2010).
[27]
Kang, H., Chen, Y., Wong, J. L., Sion, R., and Wu, J. Enhancement of Xen's scheduler for MapReduce workloads. In ACM HPDC (2011).
[28]
Kangarlou, A., Gamage, S., Kompella, R. R., and Xu, D. vSnoop: Improving TCP throughput in virtualized environments via acknowledgement offload. In ACM/IEEE SC (2010).
[29]
Kivity, A., Yaniv Kamay, D. L., Lublin, U., and Liguori., A. KVM: the Linux virtual machine monitor. In In Proceedings of the Linux Symposium (2007).
[30]
Lee, M., Krishnakumar, A. S., Krishnan, P., Singh, N., and Yajnik, S. Supporting soft real-time tasks in the Xen hypervisor. In ACM VEE (2010).
[31]
Macdonell, Cam, Xiaodi Ke, A. W. G., and Lu, P. Low-Latency, High-Bandwidth Use Cases for Nahanni/ivshmem. In KVM Forum (2011).
[32]
Menon, A., Cox, A. L., and Zwaenepoel, W. Optimizing network virtualization in Xen. In USENIX ATC (2006).
[33]
Menon, A., Schubert, S., and Zwaenepoel, W. TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers. In ACM ASPLOS (2009).
[34]
Menon, A., and Zwaenepoel, W. Optimizing TCP receive performance. In USENIX ATC (2008).
[35]
Mohebbi, H. R., Kashefi, O., and Sharifi, M. Zivm: A zero-copy inter-vm communication mechanism for cloud computing. Computer and Information Science 4, 6 (2011).
[36]
Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., and Kelly, J. The quantcast file system. Proceedings of the VLDB Endowment 6, 11 (2013), 1092--1101.
[37]
Recio, R., Culley, P., Garcia, D., Hilland, J., and Metzler, B. An rdma protocol specification. In IETF Internet-draft draft-ietf-rddp-rdmap-03 (2005).
[38]
Russell, R. Virtio -- towards a de-facto standard for virtual i/o devices. In ACM SIGOPS Operating Systems Review (2008).
[39]
Sefraoui, O., and Mohammed Aissaoui, M. E. Openstack: toward an open-source solution for cloud computing. In International Journal of Computer Applications (2012), vol. 55.
[40]
Shalev, L., Satran, J., Borovik, E., and Ben-Yehuda, M. IsoStack: Highly efficient network processing on dedicated cores. In USENIX ATC (2010).
[41]
Shvachko, K., Hairong Kuang, S. R., and Chansler., R. The Hadoop distributed file system. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010) (2010).
[42]
Subramoni, H., Ping Lai, M. L., and Panda., D. K. RDMA over Ethernet -- A preliminary study. In In Cluster Computing and Workshops (CLUSTER) (2009).
[43]
Xu, C., Gamage, S., Rao, P. N., Kangarlou, A., Kompella, R. R., and Xu, D. vSlicer: latency-aware virtual machine scheduling via differentiated-frequency cpu slicing. In HPDC (2012).
[44]
Zhang, X., Suzanne McIntosh, P. R., and Griffin, J. L. XenSocket: A high-throughput interdomain transport for virtual machines. In Middleware (2007).

Cited By

View all
  • (2023)Maximizing VMs' IO Performance on Overcommitted CPUs with FairnessProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624649(93-108)Online publication date: 30-Oct-2023
  • (2022)Portkey: hypervisor-assisted container migration in nested cloud environmentsProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516817(3-17)Online publication date: 25-Feb-2022
  • (2021)Enhancing Performance and Energy Efficiency for Hybrid Workloads in Virtualized Cloud EnvironmentIEEE Transactions on Cloud Computing10.1109/TCC.2018.28370409:1(168-181)Online publication date: 1-Jan-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '15: Proceedings of the 16th Annual Middleware Conference
November 2015
295 pages
ISBN:9781450336185
DOI:10.1145/2814576
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Middleware '15
Sponsor:
  • ACM
  • USENIX Assoc
  • IFIP
Middleware '15: 16th International Middleware Conference
December 7 - 11, 2015
BC, Vancouver, Canada

Acceptance Rates

Middleware '15 Paper Acceptance Rate 23 of 118 submissions, 19%;
Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '25
26th International Middleware Conference
December 15 - 19, 2025
Nashville , TN , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Maximizing VMs' IO Performance on Overcommitted CPUs with FairnessProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624649(93-108)Online publication date: 30-Oct-2023
  • (2022)Portkey: hypervisor-assisted container migration in nested cloud environmentsProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516817(3-17)Online publication date: 25-Feb-2022
  • (2021)Enhancing Performance and Energy Efficiency for Hybrid Workloads in Virtualized Cloud EnvironmentIEEE Transactions on Cloud Computing10.1109/TCC.2018.28370409:1(168-181)Online publication date: 1-Jan-2021
  • (2020)vSMT-IOProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489176(449-463)Online publication date: 15-Jul-2020
  • (2018)Effectively mitigating I/O inactivity in vCPU schedulingProceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference10.5555/3277355.3277382(267-279)Online publication date: 11-Jul-2018
  • (2017)Improving spark application throughput via memory aware task co-locationProceedings of the 18th ACM/IFIP/USENIX Middleware Conference10.1145/3135974.3135984(95-108)Online publication date: 11-Dec-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media