Skip to main content
Log in

Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) read cache can be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs’ lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution — LOP-Cache — to greatly improve the write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its performance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs’ lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. EMC Corporation. The EMC digital universe study. Technical Report, 2014. https://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf, Jan. 2018.

  2. Gantz J, Reinsel D. The digital universe decade—Are you ready? Technical Report, IDC-IVIEW EMC Corporation, 2010. http://www.group47.com/The_Digital_Universe_Decade-Are_You_Ready.pdf, Dec. 2017

  3. Ganesan P. Read performance enhancement in data deduplication for secondary storage [M.S. Theses]. University of Minnesota, Minnesota, USA, 2013.

  4. Alvarez C. NetApp deduplication for FAS and V-Series deployment and implementation guide. Technical Report TR-3505, NetApp, Inc., 2011. http://www.concordeitgroup.com/docs/netapp/netapp—deduplication—deployment-guide.pdf, February 2011.

  5. EMC. Achieving storage efficiency through EMC Celerra data deduplication: Applied technology. EMC White Paper, http://www.docin.com/p-688598633.html, March 2010.

  6. Mao B, Jiang H, Wu S Z, Fu Y J, Tian L. SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In Proc. the 7th IEEE Int. Conf. Networking Architecture and Storage, June 2012, pp.328-337.

  7. Rabin M O. Fingerprinting by random polynomials. Technical Report TR-15-81, Department of Mathematics, The Hebrew University of Jerusalem, and Department of Computer Science, Harvard University, 1981. http://www.cs.cmu.edu/~15-749/READINGS/optional/rabin1981.pdf, Dec. 2017.

  8. Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 18.

  9. Srinivasan K, Bisson T, Goodson G, Voruganti K. iDedup: Latency-aware, inline data deduplication for primary storage. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

  10. Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proc. the 7th Conf. File and Storage Technologies, February 2009, pp.111-123.

  11. Xia W, Jiang H, Feng D, Hua Y. SiLo: A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In Proc. USENIX Annual Technical Conf., June 2011, pp.26-28.

  12. Nam Y J, Park D, Du D H C. Assuring demanded read performance of data deduplication storage with backup datasets. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.201-208.

  13. Meister D, Brinkmann A. dedupv1: Improving deduplication throughput using solid state drives (SSD). In Proc. the 26th Symp. Mass. Storage Systems and Technologies, May 2010.

  14. Debnath B, Sudipta S, Li J. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proc. USENIX Annual Technical Conf., June 2010.

  15. Boboila S, Desnoyers P. Write endurance in flash drives: Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

  16. Grupp L M, Davis J D, Swanson S. The bleak future of NAND flash memory. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

  17. Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

  18. Chen Z G, Liu F, Du Y M. Reorder the write sequence by virtual write buffer to extend SSD’s lifespan. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011, pp.263-276.

  19. Yang Q, Ren J. I-CASH: Intelligently coupled array of SSD and HDD. In Proc. the 17th Int. Symp. High Performance Computer Architecture, February 2011, pp.278-289.

  20. Kim J, Son I, Choi J, Yoon S, Kang S,Won Y, Cha J. Deduplication in SSD for reducing write amplification factor. In Proc. the 9th USENIX Conf. File and Storage Technologies, Feb. 2011.

  21. Jeong J, Hahn S S, Lee S, Kim J. Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling. In Proc. the 12th USENIX Conf. File and Storage Technologies, February 2014, pp.61-74.

  22. Zhang L K, Neely B, Franklin D, Strukov D, Xie Y, Chong F T. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.519-531.

  23. Zhang M Z, Zhang L K, Jiang L, Liu Z Y, Chong F T. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. IEEE. Int. Symp. High Performance Computer Architecture, February 2017, pp.385-396

  24. Jiang S, Zhang X D. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2002, pp.31-42.

  25. Megiddo N, Modha D S. ARC: A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conf. File and Storage Technologies, March 2003.

  26. Huang S, Wei Q S, Chen J X, Chen C, Feng D. Improving flash-based disk cache with lazy adaptive replacement. In Proc. the 29th Symp. Mass Storage Systems and Technologies, May 2013.

  27. Matthews J, Trika S, Hensgen D, Coulson R, Grimsrud K. Intel® turbo memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems. ACM Trans. Storage (TOS), 2008, 4(2): Article No. 4.

  28. Pritchett T, Thottethodi M. SieveStore: A highly-selective, ensemble-level disk cache for cost-performance. In Proc. the 37th Annual Int. Symp. Computer Architecture, June 2010, pp.163-174.

  29. Qureshi M K, Jaleel A, Patt Y N, Steely S C, Emer J. Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 2007, 35(2): 381-391

    Article  Google Scholar 

  30. Qureshi M K, Suleman M A, Patt Y N. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In Proc. the 13th IEEE Int. Symp. High Performance Computer Architecture, February 2007, pp.250-259.

  31. Liu J, Chai Y P, Qin X, Xiao Y. PLC-cache: Endurable SSD cache for deduplication-based primary storage. In Proc. the 30th Symp. Mass Storage Systems and Technologies, June 2014.

  32. Wang L, Zhan J F, Luo C J, Zhu Y Q, Yang Q, He Y Q, Gao W L, Jia Z, Shi Y J, Zhang S J, Zheng C, Lu G, Zhan K, Li X N, Qiu B Z. BigDataBench: A big data benchmark suite from Internet services. In Proc. the 20th IEEE Int. Symp. High Performance Computer Architecture, February 2014, pp.488-499.

  33. Fu M. An experimental platform for chunk-level data deduplication. https://github.com/fomy/destor, Dec. 2017.

  34. Lillibridge M, Eshghi K, Bhagwat D. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. the 11th USENIX Conf. File and Storage Technologies, February 2013, pp.183-197.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun-Peng Chai.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 396 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Chai, YP., Qin, X. et al. Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems. J. Comput. Sci. Technol. 33, 58–78 (2018). https://doi.org/10.1007/s11390-018-1808-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1808-5

Keywords

Navigation