SnapFiner: A Page-Aware Snapshot System for Virtual Machines | IEEE Journals & Magazine | IEEE Xplore

SnapFiner: A Page-Aware Snapshot System for Virtual Machines


Abstract:

Virtual machine (VM) snapshot, enabling a VM to be resumed from a previously recorded state, is an essential part of cloud infrastructures. Unfortunately, the snapshot da...Show More

Abstract:

Virtual machine (VM) snapshot, enabling a VM to be resumed from a previously recorded state, is an essential part of cloud infrastructures. Unfortunately, the snapshot data are likely to be lost due to the high rate of disk failures, so that the associated VM fails to recover properly. To enhance data availability without compromising application performance upon rollback recovery, it is desired to place multiple replicas of snapshot across disperse disks. However, due to the large size of replica, it induces non-trivial storage cost when managing massive snapshots in clouds. In this paper, we investigate this problem and find out that the semantic gap existed between snapshot creation and snapshot storing is one key factor inducing high storage cost. To this end, we propose SnapFiner, a page-aware snapshot system for creating and storing massive snapshot files efficiently. First, SnapFiner acquires a fine-grained page categorization with an in-depth page exploration from three orthogonal views, thereby discovering more pages that can be excluded from the snapshot. Second, SnapFiner varies the number of replicas for different page categories based on a page-aware replication policy, achieving low storage cost without compromising availability and performance. Third, SnapFiner handles the loss of pages either intentionally dropped upon snapshot creation or unexpectedly damaged due to disk failures, enabling proper system execution after rollback recovery. We have implemented SnapFiner on QEMU/KVM to justify its practicality for Linux guests. The experimental results demonstrate that SnapFiner reduces the storage cost by 33 and 69.5 percent respectively compared to our previous work PARS and the naive approach on QEMU/KVM and HDFS.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 29, Issue: 11, 01 November 2018)
Page(s): 2613 - 2626
Date of Publication: 30 April 2018

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.