FITDOC: fast virtual machines checkpointing with delta memory compression

Du, Yunjie; Shi, Xuanhua; Jin, Hai; Wu, Song; Yang, Laurence T.

doi:10.1007/s11227-015-1429-5

FITDOC: fast virtual machines checkpointing with delta memory compression

Published: 23 April 2015

Volume 72, pages 3328–3347, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yunjie Du¹,
Xuanhua Shi¹,
Hai Jin¹,
Song Wu¹ &
…
Laurence T. Yang^1,2

210 Accesses
Explore all metrics

Abstract

Virtualization provides the function of saving the entire status of the execution environment of a running virtual machine (VM), which makes checkpointing flexible and practical for HPC servers or data center servers. However, the system-level checkpointing needs to save a large number of data to the disk. Moreover, the overhead grows linearly with the increasing size of virtual machine memory, which leads to disk I/O consumption disaster along with poor system scalability. To target this, we propose a novel fast VM checkpointing approach, named Fast Incremental checkpoinTing with Delta memOry Compression (FITDOC). By studying the run-time memory characteristics of different workloads, FITDOC counts the dirty pages in a fine-granularity manner (i.e., the number of 8 bytes), instead of in the conventional method (i.e., the number of pages). FITDOC utilises a dirty page logging mechanism to record the dirty pages. Accordingly, a delta memory compression mechanism is implemented to eliminate redundant memory data in checkpointing files. To locate the dirty data in dirty pages, FITDOC utilizes two mechanisms: by analyzing the distribution characteristics of dirty pages in the dirty bitmap, we propose a fast dirty bitmap scanning method to locate the dirty pages, and take a multi-threading data comparison mechanism to locate the real dirty data in one page. The experimental results show that compared with Xen’s default system-level checkpointing algorithm, FITDOC can on average reduce checkpointing time 70.54 % with a 1 GB memory size and achieve better improvement for VMs with larger memory configurations. FITDOC can reduce the size of checkpointing data 52.88 % on average compared with Remus’s incremental solution, which is in page granularity. Compared with the default dirty bitmap scanning method in Xen, the scanning time of FITDOC is decreased by 91.13 % on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

Article 03 April 2024

Malek Yousefi & Seyed Morteza Babamir

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Article 08 July 2023

Karima Saidi & Dalal Bardou

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Xingqi Zou, Sheng Xu, … Yinhe Han

References

Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7:34–45
Article Google Scholar
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP’03), pp 164–177
Nathuji R, Schwan K (2007) VirtualPower: coordinated power management in virtualized enterprise systems. In: Proceedings of the 21st ACM symposium on operating systems principles (SOSP’07), pp 265–278
Jan S, Lang C, Bellosa F (2007) Energy management for hypervisor-based virtual machines. In: Proceedings of the USENIX annual technical conference
Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of 21st ACM international conference on supercomputing (ICS’07), pp 23–32
Zhu J, Dong W, Jiang Z, Shi X, Xiao Z, Li X (2010) Improving the performance of hypervisor-based fault tolerance. In: Proceedings of international parallel and distributed processing symposium (IPDPS’10), pp 1–10
Nicolae B, Cappello F (2011) BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC’11), pp 1–12
Cully B, Lefebvre G, Meyer D, Feeley M, Hutchinson N, Warfield A (2008) Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation (NSDI’08), pp 161–174
Gerofi B, Vass Z, Ishikawa Y (2011) Utilizing memory content similarity for improving the performance of replicated virtual machines. In: Proceedings of the 4th IEEE international conference on utility and cloud computing (UCC’11), pp 73–80
Park E, Egger B, Lee J (2011) Fast and space-efficient virtual machine checkpointing. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 75–86
Deng L, Jin H, Wu S, Shi X, Zhou J (2011) Fast saving and restoring virtual machines with page compression. In: Proceedings of the 2011 international conference on cloud and service computing (CSC’11), pp 150–157
Zhang X, Huo Z, Ma J, Meng D (2010) Exploiting data deduplication to accelerate live virtual machine migration. In: Proceedings of the 2010 IEEE international conference on cluster computing (Cluster’10), pp 88–96
DBENCH. http://dbench.samba.org/
The Linux Kernel Archives. http://www.kernel.org/
NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html
Welcome to Apache Hadoop! http://hadoop.apache.org/
TPC-W-Homepage. http://www.tpc.org/tpcw/
Agarwal S, Garg R, Gupta MS, Moreira JE (2004) Adaptive incremental checkpointing for massively parallel systems. In: Proceedings of 18th ACM international conference on supercomputing (ICS’04)
Naksinehaboon N, Liu Y, Leangsuksun C, Nassar R, Paun M, Scott SL (2008) Reliability-aware approach: an incremental checkpoint/restart model in HPC environments. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGrid’08)
Svard P, Hudzia B, Tordsson J, Elmroth E (2011) Evaluation of delta compression techniques for efficient live migration of large virtual machines. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 111–120
Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In Proceedings of the second USENIX symposium on networked systems design and implementation (NSDI’05), pp 273–286
Nelson M, Lim B, Hutchines G (2005) Fast transparent migration for virtual machines. In: Proceedings of the USENIX annual technical conference (USENIX’05), pp 391–394
Hines MR, Gopalan K (2009) Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’09), pp 51–60
Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive memory compression. In: Proceedings of the IEEE international conference on cluster computing (Cluster’09), pp 1–10
Liu H, Jin H, Liao X, Hu L, Yu C (2009) Live migration of virtual machine based on full system trace and replay. In: Proceedings of the 18th international symposium on high performance distributed computing (HPDC’09), pp 101–110
Huang W, Gao Q, Liu J, Panda DK (2007) High performance virtual machine migration with RDMA over modern interconnects. In: Proceedings of the IEEE international conference on cluster computing (Cluster’07), pp 11–20
Zou H, Yu Y, Tang W (2014) FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1:4–13
Article Google Scholar
Yi S, Heo J, Cho Y, Hong J (2006) Adaptive page-level incremental checkpointing based on expected recovery time. In: Proceedings of the 2006 ACM symposium on applied computing (SAC’06), pp 1472–1476

Download references

Acknowledgments

This paper is partly supported by the NSFC under Grant No. 61370104 and No. 61433019, MOE-Intel Special Research Fund of Information Technology under Grant MOE-INTEL-2012-01, and Chinese Universities Scientific Fund under Grant No. 2014TS008.

Author information

Authors and Affiliations

Service Computing Technology and Systems Laboratory, Cluster and Grid Computing Laboratory, School of Computer, Huazhong University of Science and Technology, Wuhan, 430074, China
Yunjie Du, Xuanhua Shi, Hai Jin, Song Wu & Laurence T. Yang
Department of Computer Science, St Francis Xavier University, Antigonish, B2G 2W5, Canada
Laurence T. Yang

Authors

Yunjie Du
View author publications
You can also search for this author in PubMed Google Scholar
Xuanhua Shi
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar
Laurence T. Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuanhua Shi.

Additional information

A preliminary version containing some of the results in this paper has been published in the CSE 2014.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, Y., Shi, X., Jin, H. et al. FITDOC: fast virtual machines checkpointing with delta memory compression. J Supercomput 72, 3328–3347 (2016). https://doi.org/10.1007/s11227-015-1429-5

Download citation

Published: 23 April 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11227-015-1429-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FITDOC: fast virtual machines checkpointing with delta memory compression

Abstract

Access this article

Similar content being viewed by others

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FITDOC: fast virtual machines checkpointing with delta memory compression

Abstract

Access this article

Similar content being viewed by others

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation