ABSTRACT
In IaaS clouds, virtual machines are booted on demand from user-provided disk images. Both the number of virtual machine images (VMIs) and their large size(GBs), challenge storage and network transfer solutions, and lead to perceivably slow VM startup times. In previous work, we proposed using small VMI caches (O(100MB)) that contain those parts of a VMI that are actually needed for booting. Here, we present Squirrel, a fully replicated storage architecture that exploits deduplication, compression, and snapshots from the ZFS file system, and lets us keep large quantities of VMI caches on all compute nodes of a data center with modest storage requirements. (Much like rodents cache precious food in many distributed places.) Our evaluation shows that we can store VMI caches for all 600+ community images of Windows Azure, worth 16.4TB of raw data, within 10GB of disk space and 60MB of main memory on each compute node of our DAS-4 cluster. Extrapolation to several thousands of images predicts the scalability of our approach.
- S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu. VMFlock: Virtual Machine Co-migration for the Cloud. In Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC '11, pages 159--170, 2011. Google ScholarDigital Library
- Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/, 2006. {Online; accessed 22-01-2014}.Google Scholar
- ec2 upload bundle. http://docs.aws.amazon.com/ AWSEC2/latest/CommandLineReference/ CLTRG-ami-upload-bundle.html, 2006. {Online; accessed 22-01-2014}.Google Scholar
- J. Bonwick and B. Moore. ZFS: The Last Word in File Systems. The SNIA Software Developers' Conference, 2008.Google Scholar
- D. Borthakur. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation, 2007.Google Scholar
- D. Campello, C. Crespo, A. Verma, R. Rangaswami, and P. Jayachandran. Coriolis: Scalable VM Clustering in Clouds. In Presented as part of the 10th International Conference on Autonomic Computing, pages 101--105, 2013.Google Scholar
- V. Chadha and R. J. Figueiredo. ROW-FS: a user-level virtualized redirect-on-write distributed file system for wide area applications. In Proceedings of the 14th international conference on High performance computing, HiPC '07, pages 21--34, 2007. Google ScholarDigital Library
- Z. Chen, Y. Zhao, X. Miao, Y. Chen, and Q. Wang. Rapid Provisioning of Cloud Infrastructure Leveraging Peer-to-Peer Networks. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW '09, pages 324--329, 2009. Google ScholarDigital Library
- L. Cui, J. Li, B. Li, J. Huai, C. Ho, T. Wo, H. Al-Aqrabi, and L. Liu. VMScatter: Migrate Virtual Machines to Many Hosts. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 63--72, 2013. Google ScholarDigital Library
- CurveExpert Professional. http://www.curveexpert.net/products/curveexpert-professional. {Online; accessed 24-01-2014}.Google Scholar
- DAS-4 clusters. http://www.cs.vu.nl/das4/clusters.shtml. {Online; accessed 24-01-2014}.Google Scholar
- M. Dutch. Understanding data deduplication ratios. SNIA Data Management Forum, 2008.Google Scholar
- H. Fernandez, G. Pierre, and T. Kielmann. Autoscaling Web Applications in Heterogeneous Cloud Infrastructures. In Proceedings of the IEEE International Conference on Cloud Engineering (IC2E), Mar. 2014. Google ScholarDigital Library
- L. Garces-Erice and S. Rooney. Scaling OS Streaming Through Minimizing Cache Redundancy. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops, ICDCSW '11, pages 47--53, 2011. Google ScholarDigital Library
- L. Garces-Erice and S. Rooney. Scaling OS Streaming through Minimizing Cache Redundancy. In 31st International Conference on Distributed Computing Systems Workshops (ICDCSW), pages 47--53, 2011. Google ScholarDigital Library
- A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Transactions on Parallel and Distributed Systems, 2010. Google ScholarDigital Library
- K. Jackson. OpenStack Cloud Computing Cookbook. Packt Publishing, 2012. Google ScholarDigital Library
- K. R. Jayaram, C. Peng, Z. Zhang, M. Kim, H. Chen, and H. Lei. An Empirical Analysis of Similarity in Virtual Machine Images. In Proceedings of the Middleware 2011 Industry Track Workshop, number 6 in Middleware '11, pages 6:1--6:6, 2011. Google ScholarDigital Library
- K. Jin and E. L. Miller. The Effectiveness of Deduplication on Virtual Machine Disk Images. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, number 7 in SYSTOR '09, pages 7:1--7:12, 2009. Google ScholarDigital Library
- H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: rapid virtual machine cloning for cloud computing. In Proceedings of the 4th ACM European conference on Computer systems, EuroSys '09, pages 1--12, 2009. Google ScholarDigital Library
- M. Mao and M. Humphrey. A Performance Study on the VM Startup Time in the Cloud. In 5th International IEEE Conference on Cloud Computing, CLOUD '12, pages 423--430, 2012. Google ScholarDigital Library
- M. McLoughlin. The QCOW2 Image Format. http://people.gnome.org/~markmc/qcow-image-format.html, 2008. {Online; accessed 24-01-2014}.Google Scholar
- D. Milojićić, I. Llorente, and R. S. Montero. OpenNebula: A Cloud Management Tool. IEEE Internet Computing, 15(2):11--14, 2011. Google ScholarDigital Library
- C. B. Morrey and D. Grunwald. Content-Based Block Caching. In 23rd IEEE, 14th NASA Goddard Conference on Mass Storage Systems and Technologies, MSST '06, 2006.Google Scholar
- P. Nagesh and A. Kathpal. Rangoli: Space Management in Deduplication Environments. In Proceedings of the 6th International Systems and Storage Conference, SYSTOR '13, 2013. Google ScholarDigital Library
- C.-H. Ng, M. Ma, T.-Y. Wong, P. P. C. Lee, and J. C. S. Lui. Live Deduplication Storage of Virtual Machine Images in an Open-source Cloud. In Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware, Middleware'11, pages 81--100, 2011. Google ScholarDigital Library
- B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-Amarie. BlobSeer: Next-generation data management for large scale infrastructures. Journal of Parallel and Distributed Computing, 71(2):169--184, 2011. Google ScholarDigital Library
- B. Nicolae, J. Bresnahan, K. Keahey, and G. Antoniu. Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC '11), pages 147--158, 2011. Google ScholarDigital Library
- B. Nicolae, F. Cappello, and G. Antoniu. Optimizing multi-deployment on clouds by means of self-adaptive prefetching. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par '11, pages 503--513, 2011. Google ScholarDigital Library
- Nimbus Project. LANTorrent. http://www.nimbusproject.org/docs/current/admin/reference.html#lantorrent, 2010. {Online; accessed 27-01-2014}.Google Scholar
- C. M. O'Donnell. Using BitTorrent to distribute virtual machine images for classes. In Proceedings of the 36th annual ACM SIGUCCS fall conference: moving mountains, blazing trails, SIGUCCS '08, pages 287--290, 2008. Google ScholarDigital Library
- E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, pages 297--306, 1993. Google ScholarDigital Library
- C. Peng, M. Kim, Z. Zhang, and H. Lei. VDN: Virtual machine image distribution network for cloud data centers. In 29th Conference on Computer Communications, INFOCOM '10, pages 181--189, 2012.Google ScholarCross Ref
- K. Razavi and T. Kielmann. Scalable Virtual Machine Deployment Using VM Image Caches. In Proceedingsof the International Conference on High Performance Computing, Networking, Storage and Analysis, number 65 in SC '13, 2013. Google ScholarDigital Library
- K. Razavi, L. M. Razorea, and T. Kielmann. Reducing VM Startup Time and Storage Costs by VM Image Content Consolidation. In 1st Workshop on Dependability and Interoperability In Heterogeneous Clouds, Euro-Par 2013: Parallel Processing Workshops, 2013.Google Scholar
- J. Reich, O. Laadan, E. Brosh, A. Sherman, V. Misra, J. Nieh, and D. Rubenstein. VMTorrent: scalable P2P virtual machine streaming. In Proceedings of the 8th international conference on Emerging networking experiments and technologies, CoNEXT '12, pages 289--300, 2012. Google ScholarDigital Library
- rsync. http://rsync.samba.org. {Online; accessed 22-01-2014}.Google Scholar
- M. Schmidt, N. Fallenbeck, M. Smith, and B. Freisleben. Efficient Distribution of Virtual Machines for Cloud Computing. In 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), PDP '10, pages 567--574, 2010. Google ScholarDigital Library
- B. Sotomayor, K. Keahey, and I. Foster. Combining Batch Execution and Leasing Using Virtual Machines. In Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC '08, pages 87--96, 2008. Google ScholarDigital Library
- R. Wartel, T. Cass, B. Moreira, E. Roche, M. Guijarro, S. Goasguen, and U. Schwickerath. Image Distribution Mechanisms in Large Scale Cloud Providers. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CloudCom '10, pages 112--117, 2010. Google ScholarDigital Library
- ZFS on Linux. http://zfsonlinux.org. {Online; accessed 24-01-2014}.Google Scholar
- Z. Zhang, Z. Li, K. Wu, D. Li, H. Li, Y. Peng, and X. Lu. VMThunder: Fast Provisioning of Large-Scale Virtual Machine Clusters. IEEE Transactions on Parallel and Distributed Systems, 99, 2014.Google Scholar
- M. Zhao, J. Zhang, and R. Figueiredo. Distributed File System Support for Virtual Machines in Grid Computing. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC '04, pages 202--211, 2004. Google ScholarDigital Library
- X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang, and K. Li. Liquid: A Scalable Deduplication File System for Virtual Machine Images. IEEE Transaction on Parallel and Distributed Systems, 2013.Google Scholar
Index Terms
- Squirrel: scatter hoarding VM image contents on IaaS compute nodes
Recommendations
Flash-Based Storage Deduplication Techniques: A Survey
Exponential growth of the amount of data stored worldwide together with high level of data redundancy motivates the active development of data deduplication techniques. The overall increasing popularity of solid-state drives (SSDs) as primary storage ...
Live deduplication storage of virtual machine images in an open-source cloud
Middleware'11: Proceedings of the 12th ACM/IFIP/USENIX international conference on MiddlewareDeduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication ...
Live deduplication storage of virtual machine images in an open-source cloud
Middleware '11: Proceedings of the 12th International Middleware ConferenceDeduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication ...
Comments