Skip to main content
Log in

FITDOC: fast virtual machines checkpointing with delta memory compression

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Virtualization provides the function of saving the entire status of the execution environment of a running virtual machine (VM), which makes checkpointing flexible and practical for HPC servers or data center servers. However, the system-level checkpointing needs to save a large number of data to the disk. Moreover, the overhead grows linearly with the increasing size of virtual machine memory, which leads to disk I/O consumption disaster along with poor system scalability. To target this, we propose a novel fast VM checkpointing approach, named Fast Incremental checkpoinTing with Delta memOry Compression (FITDOC). By studying the run-time memory characteristics of different workloads, FITDOC counts the dirty pages in a fine-granularity manner (i.e., the number of 8 bytes), instead of in the conventional method (i.e., the number of pages). FITDOC utilises a dirty page logging mechanism to record the dirty pages. Accordingly, a delta memory compression mechanism is implemented to eliminate redundant memory data in checkpointing files. To locate the dirty data in dirty pages, FITDOC utilizes two mechanisms: by analyzing the distribution characteristics of dirty pages in the dirty bitmap, we propose a fast dirty bitmap scanning method to locate the dirty pages, and take a multi-threading data comparison mechanism to locate the real dirty data in one page. The experimental results show that compared with Xen’s default system-level checkpointing algorithm, FITDOC can on average reduce checkpointing time 70.54 % with a 1 GB memory size and achieve better improvement for VMs with larger memory configurations. FITDOC can reduce the size of checkpointing data 52.88 % on average compared with Remus’s incremental solution, which is in page granularity. Compared with the default dirty bitmap scanning method in Xen, the scanning time of FITDOC is decreased by 91.13 % on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7:34–45

    Article  Google Scholar 

  2. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP’03), pp 164–177

  3. Nathuji R, Schwan K (2007) VirtualPower: coordinated power management in virtualized enterprise systems. In: Proceedings of the 21st ACM symposium on operating systems principles (SOSP’07), pp 265–278

  4. Jan S, Lang C, Bellosa F (2007) Energy management for hypervisor-based virtual machines. In: Proceedings of the USENIX annual technical conference

  5. Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of 21st ACM international conference on supercomputing (ICS’07), pp 23–32

  6. Zhu J, Dong W, Jiang Z, Shi X, Xiao Z, Li X (2010) Improving the performance of hypervisor-based fault tolerance. In: Proceedings of international parallel and distributed processing symposium (IPDPS’10), pp 1–10

  7. Nicolae B, Cappello F (2011) BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC’11), pp 1–12

  8. Cully B, Lefebvre G, Meyer D, Feeley M, Hutchinson N, Warfield A (2008) Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation (NSDI’08), pp 161–174

  9. Gerofi B, Vass Z, Ishikawa Y (2011) Utilizing memory content similarity for improving the performance of replicated virtual machines. In: Proceedings of the 4th IEEE international conference on utility and cloud computing (UCC’11), pp 73–80

  10. Park E, Egger B, Lee J (2011) Fast and space-efficient virtual machine checkpointing. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 75–86

  11. Deng L, Jin H, Wu S, Shi X, Zhou J (2011) Fast saving and restoring virtual machines with page compression. In: Proceedings of the 2011 international conference on cloud and service computing (CSC’11), pp 150–157

  12. Zhang X, Huo Z, Ma J, Meng D (2010) Exploiting data deduplication to accelerate live virtual machine migration. In: Proceedings of the 2010 IEEE international conference on cluster computing (Cluster’10), pp 88–96

  13. DBENCH. http://dbench.samba.org/

  14. The Linux Kernel Archives. http://www.kernel.org/

  15. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html

  16. Welcome to Apache Hadoop! http://hadoop.apache.org/

  17. TPC-W-Homepage. http://www.tpc.org/tpcw/

  18. Agarwal S, Garg R, Gupta MS, Moreira JE (2004) Adaptive incremental checkpointing for massively parallel systems. In: Proceedings of 18th ACM international conference on supercomputing (ICS’04)

  19. Naksinehaboon N, Liu Y, Leangsuksun C, Nassar R, Paun M, Scott SL (2008) Reliability-aware approach: an incremental checkpoint/restart model in HPC environments. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGrid’08)

  20. Svard P, Hudzia B, Tordsson J, Elmroth E (2011) Evaluation of delta compression techniques for efficient live migration of large virtual machines. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 111–120

  21. Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In Proceedings of the second USENIX symposium on networked systems design and implementation (NSDI’05), pp 273–286

  22. Nelson M, Lim B, Hutchines G (2005) Fast transparent migration for virtual machines. In: Proceedings of the USENIX annual technical conference (USENIX’05), pp 391–394

  23. Hines MR, Gopalan K (2009) Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’09), pp 51–60

  24. Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive memory compression. In: Proceedings of the IEEE international conference on cluster computing (Cluster’09), pp 1–10

  25. Liu H, Jin H, Liao X, Hu L, Yu C (2009) Live migration of virtual machine based on full system trace and replay. In: Proceedings of the 18th international symposium on high performance distributed computing (HPDC’09), pp 101–110

  26. Huang W, Gao Q, Liu J, Panda DK (2007) High performance virtual machine migration with RDMA over modern interconnects. In: Proceedings of the IEEE international conference on cluster computing (Cluster’07), pp 11–20

  27. Zou H, Yu Y, Tang W (2014) FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1:4–13

    Article  Google Scholar 

  28. Yi S, Heo J, Cho Y, Hong J (2006) Adaptive page-level incremental checkpointing based on expected recovery time. In: Proceedings of the 2006 ACM symposium on applied computing (SAC’06), pp 1472–1476

Download references

Acknowledgments

This paper is partly supported by the NSFC under Grant No. 61370104 and No. 61433019, MOE-Intel Special Research Fund of Information Technology under Grant MOE-INTEL-2012-01, and Chinese Universities Scientific Fund under Grant No. 2014TS008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuanhua Shi.

Additional information

A preliminary version containing some of the results in this paper has been published in the CSE 2014.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, Y., Shi, X., Jin, H. et al. FITDOC: fast virtual machines checkpointing with delta memory compression. J Supercomput 72, 3328–3347 (2016). https://doi.org/10.1007/s11227-015-1429-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1429-5

Keywords

Navigation