A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

Lai, Rongyu; Hua, Yu; Feng, Dan; Xia, Wen; Fu, Min; Yang, Yifan

doi:10.1007/978-3-319-11197-1_35

A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

Rongyu Lai²⁴,
Yu Hua²⁴,
Dan Feng²⁴,
Wen Xia²⁴,
Min Fu²⁴ &
…
Yifan Yang²⁴

Conference paper

2588 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8630))

Abstract

Cloud backup systems leverage data deduplication to remove duplicate chunks that are shared by many versions. The duplicate chunks are replaced with the references to old chunks via deduplication, instead of being uploaded to the cloud. The consecutive chunks in backup streams are actually stored dispersedly in several segments (the storage unit in the cloud), which results in fragmentation for restore. The segments that are referred will be downloaded from the cloud when the users want to restore the chunks of the latest version, and some chunks that are not referred will be downloaded together, thus jeopardizing the restore performance. In order to address this problem, we propose a near-exact defragmentation scheme, called NED, for deduplication based cloud backups. The idea behind NED is to compute the ratio of the length of chunks referred by current data stream in a segment to the segment length. If the ratio is smaller than a threshold, the chunks in the data stream that refer to the segment will be labeled as fragments and written to new segments. By efficiently identifying fragmented chunks, NED significantly reduces the number of segments for restore with slight decrease of deduplication ratio. Experiment results based on real-world datasets demonstrate that NED effectively improves the restore performance by 6%~105% at the cost of 0.1%~6.5% decrease in terms of deduplication ratio.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: Hydrastor: A scalable secondary storage. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 197–210. USENIX (2009)
Google Scholar
Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Xia, W., Huang, F., Liu, Q.: Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 181–192 (2014)
Google Scholar
Guo, F., Efstathopoulos, P.: Building a high-performance deduplication system. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIX (2011)
Google Scholar
Kaczmarczyk, M., Barczynski, M., Kilian, W., Dubnicki, C.: Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th Annual International Systems and Storage Conference, pp. 15:1–15:12. ACM (2012)
Google Scholar
Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 2013), pp. 183–197. USENIX (2013)
Google Scholar
Nam, Y.J., Park, D., Du, D.H.C.: Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 201–208. IEEE (2012)
Google Scholar
Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: idedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies. USENIX (2012)
Google Scholar
Tan, Y., Jiang, H., Feng, D., Tian, L., Yan, Z.: Cabdedupe: A causality-based deduplication performance booster for cloud backup services. In: Parallel & Distributed Processing Symposium (IPDPS), pp. 1266–1277. IEEE (2011)
Google Scholar
Vrable, M., Savage, S., Voelker, G.M.: Cumulus: Filesystem backup to the cloud. Trans. Storage 5(4), 14:1–14:28 (2009)
Google Scholar
Xia, W., Jiang, H., Feng, D., Hua, Y.: Silo: A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: USENIX Annual Technical Conference (2011)
Google Scholar
Xia, W., Jiang, H., Feng, D., Tian, L.: Combining deduplication and delta compression to achieve low-overhead data reduction on backup datasets. In: Data Compression Conference (DCC 2014), pp. 203–212 (2014)
Google Scholar
Xu, Q., Zhao, L., Xiao, M., Liu, A., Dai, Y.: Yurubackup: A space-efficient and highly scalable incremental backup system in the cloud. International Journal of Parallel Programming, 1–23 (2013)
Google Scholar
Zhan, D., Jiang, H., Seth, S.: Exploiting set-level non-uniformity of capacity demand to enhance cmp cooperative caching. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–10 (2010)
Google Scholar
Zhan, D., Jiang, H., Seth, S.: Stem: Spatiotemporal management of capacity for intra-core last level caches. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 163–174 (2010)
Google Scholar
Zhan, D., Jiang, H., Seth, S.C.: Locality & utility co-optimization for practical capacity management of shared last level caches. In: Proceedings of the 26th ACM International Conference on Supercomputing, pp. 279–290. ACM (2012)
Google Scholar
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 18:1–18:14. USENIX (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Wuhan National Laboratory for Optoelectronics (WNLO), Huazhong University of Science and Technology, Wuhan, China
Rongyu Lai, Yu Hua, Dan Feng, Wen Xia, Min Fu & Yifan Yang

Authors

Rongyu Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yu Hua
View author publications
You can also search for this author in PubMed Google Scholar
Dan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Wen Xia
View author publications
You can also search for this author in PubMed Google Scholar
Min Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Illinois Institute of Technology, 60616-3793, Chicago, IL, USA
Xian-he Sun
School of Computer Science and Technology, Dalian Maritime University, 1 Linghai Road, 116026, Dalian, China
Wenyu Qu
University of Ottawa, SEECS, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Deakin University, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Wanlei Zhou
Dalian Maritime University, NO.1 Linhai Road, 116026, Dailian, China
Zhiyang Li & Tingting Yang &
BeiHang University, XueYuan Road No.37,HaiDian District, Beijing, China
Hua Guo
University of Bradford, BD7 1DP, Bradford, West Yorkshire, United Kingdom
Geyong Min
Computer Network Information Center, Chinese Academy of Sciences, 100190, Beijing, China
Yulei Wu
27 Shanda Nanlu, 250100, Jinan City, Shandong Province, China
Lei Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, R., Hua, Y., Feng, D., Xia, W., Fu, M., Yang, Y. (2014). A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8630. Springer, Cham. https://doi.org/10.1007/978-3-319-11197-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-11197-1_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11196-4
Online ISBN: 978-3-319-11197-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics