skip to main content
10.1145/2600212.2600221acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Squirrel: scatter hoarding VM image contents on IaaS compute nodes

Published:23 June 2014Publication History

ABSTRACT

In IaaS clouds, virtual machines are booted on demand from user-provided disk images. Both the number of virtual machine images (VMIs) and their large size(GBs), challenge storage and network transfer solutions, and lead to perceivably slow VM startup times. In previous work, we proposed using small VMI caches (O(100MB)) that contain those parts of a VMI that are actually needed for booting. Here, we present Squirrel, a fully replicated storage architecture that exploits deduplication, compression, and snapshots from the ZFS file system, and lets us keep large quantities of VMI caches on all compute nodes of a data center with modest storage requirements. (Much like rodents cache precious food in many distributed places.) Our evaluation shows that we can store VMI caches for all 600+ community images of Windows Azure, worth 16.4TB of raw data, within 10GB of disk space and 60MB of main memory on each compute node of our DAS-4 cluster. Extrapolation to several thousands of images predicts the scalability of our approach.

References

  1. S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu. VMFlock: Virtual Machine Co-migration for the Cloud. In Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC '11, pages 159--170, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/, 2006. {Online; accessed 22-01-2014}.Google ScholarGoogle Scholar
  3. ec2 upload bundle. http://docs.aws.amazon.com/ AWSEC2/latest/CommandLineReference/ CLTRG-ami-upload-bundle.html, 2006. {Online; accessed 22-01-2014}.Google ScholarGoogle Scholar
  4. J. Bonwick and B. Moore. ZFS: The Last Word in File Systems. The SNIA Software Developers' Conference, 2008.Google ScholarGoogle Scholar
  5. D. Borthakur. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation, 2007.Google ScholarGoogle Scholar
  6. D. Campello, C. Crespo, A. Verma, R. Rangaswami, and P. Jayachandran. Coriolis: Scalable VM Clustering in Clouds. In Presented as part of the 10th International Conference on Autonomic Computing, pages 101--105, 2013.Google ScholarGoogle Scholar
  7. V. Chadha and R. J. Figueiredo. ROW-FS: a user-level virtualized redirect-on-write distributed file system for wide area applications. In Proceedings of the 14th international conference on High performance computing, HiPC '07, pages 21--34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Chen, Y. Zhao, X. Miao, Y. Chen, and Q. Wang. Rapid Provisioning of Cloud Infrastructure Leveraging Peer-to-Peer Networks. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW '09, pages 324--329, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Cui, J. Li, B. Li, J. Huai, C. Ho, T. Wo, H. Al-Aqrabi, and L. Liu. VMScatter: Migrate Virtual Machines to Many Hosts. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 63--72, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CurveExpert Professional. http://www.curveexpert.net/products/curveexpert-professional. {Online; accessed 24-01-2014}.Google ScholarGoogle Scholar
  11. DAS-4 clusters. http://www.cs.vu.nl/das4/clusters.shtml. {Online; accessed 24-01-2014}.Google ScholarGoogle Scholar
  12. M. Dutch. Understanding data deduplication ratios. SNIA Data Management Forum, 2008.Google ScholarGoogle Scholar
  13. H. Fernandez, G. Pierre, and T. Kielmann. Autoscaling Web Applications in Heterogeneous Cloud Infrastructures. In Proceedings of the IEEE International Conference on Cloud Engineering (IC2E), Mar. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Garces-Erice and S. Rooney. Scaling OS Streaming Through Minimizing Cache Redundancy. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops, ICDCSW '11, pages 47--53, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Garces-Erice and S. Rooney. Scaling OS Streaming through Minimizing Cache Redundancy. In 31st International Conference on Distributed Computing Systems Workshops (ICDCSW), pages 47--53, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Transactions on Parallel and Distributed Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Jackson. OpenStack Cloud Computing Cookbook. Packt Publishing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. R. Jayaram, C. Peng, Z. Zhang, M. Kim, H. Chen, and H. Lei. An Empirical Analysis of Similarity in Virtual Machine Images. In Proceedings of the Middleware 2011 Industry Track Workshop, number 6 in Middleware '11, pages 6:1--6:6, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Jin and E. L. Miller. The Effectiveness of Deduplication on Virtual Machine Disk Images. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, number 7 in SYSTOR '09, pages 7:1--7:12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: rapid virtual machine cloning for cloud computing. In Proceedings of the 4th ACM European conference on Computer systems, EuroSys '09, pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Mao and M. Humphrey. A Performance Study on the VM Startup Time in the Cloud. In 5th International IEEE Conference on Cloud Computing, CLOUD '12, pages 423--430, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. McLoughlin. The QCOW2 Image Format. http://people.gnome.org/~markmc/qcow-image-format.html, 2008. {Online; accessed 24-01-2014}.Google ScholarGoogle Scholar
  23. D. Milojićić, I. Llorente, and R. S. Montero. OpenNebula: A Cloud Management Tool. IEEE Internet Computing, 15(2):11--14, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. B. Morrey and D. Grunwald. Content-Based Block Caching. In 23rd IEEE, 14th NASA Goddard Conference on Mass Storage Systems and Technologies, MSST '06, 2006.Google ScholarGoogle Scholar
  25. P. Nagesh and A. Kathpal. Rangoli: Space Management in Deduplication Environments. In Proceedings of the 6th International Systems and Storage Conference, SYSTOR '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-H. Ng, M. Ma, T.-Y. Wong, P. P. C. Lee, and J. C. S. Lui. Live Deduplication Storage of Virtual Machine Images in an Open-source Cloud. In Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware, Middleware'11, pages 81--100, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-Amarie. BlobSeer: Next-generation data management for large scale infrastructures. Journal of Parallel and Distributed Computing, 71(2):169--184, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Nicolae, J. Bresnahan, K. Keahey, and G. Antoniu. Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC '11), pages 147--158, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Nicolae, F. Cappello, and G. Antoniu. Optimizing multi-deployment on clouds by means of self-adaptive prefetching. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par '11, pages 503--513, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nimbus Project. LANTorrent. http://www.nimbusproject.org/docs/current/admin/reference.html#lantorrent, 2010. {Online; accessed 27-01-2014}.Google ScholarGoogle Scholar
  31. C. M. O'Donnell. Using BitTorrent to distribute virtual machine images for classes. In Proceedings of the 36th annual ACM SIGUCCS fall conference: moving mountains, blazing trails, SIGUCCS '08, pages 287--290, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, pages 297--306, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Peng, M. Kim, Z. Zhang, and H. Lei. VDN: Virtual machine image distribution network for cloud data centers. In 29th Conference on Computer Communications, INFOCOM '10, pages 181--189, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  34. K. Razavi and T. Kielmann. Scalable Virtual Machine Deployment Using VM Image Caches. In Proceedingsof the International Conference on High Performance Computing, Networking, Storage and Analysis, number 65 in SC '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Razavi, L. M. Razorea, and T. Kielmann. Reducing VM Startup Time and Storage Costs by VM Image Content Consolidation. In 1st Workshop on Dependability and Interoperability In Heterogeneous Clouds, Euro-Par 2013: Parallel Processing Workshops, 2013.Google ScholarGoogle Scholar
  36. J. Reich, O. Laadan, E. Brosh, A. Sherman, V. Misra, J. Nieh, and D. Rubenstein. VMTorrent: scalable P2P virtual machine streaming. In Proceedings of the 8th international conference on Emerging networking experiments and technologies, CoNEXT '12, pages 289--300, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. rsync. http://rsync.samba.org. {Online; accessed 22-01-2014}.Google ScholarGoogle Scholar
  38. M. Schmidt, N. Fallenbeck, M. Smith, and B. Freisleben. Efficient Distribution of Virtual Machines for Cloud Computing. In 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), PDP '10, pages 567--574, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Sotomayor, K. Keahey, and I. Foster. Combining Batch Execution and Leasing Using Virtual Machines. In Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC '08, pages 87--96, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Wartel, T. Cass, B. Moreira, E. Roche, M. Guijarro, S. Goasguen, and U. Schwickerath. Image Distribution Mechanisms in Large Scale Cloud Providers. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CloudCom '10, pages 112--117, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. ZFS on Linux. http://zfsonlinux.org. {Online; accessed 24-01-2014}.Google ScholarGoogle Scholar
  42. Z. Zhang, Z. Li, K. Wu, D. Li, H. Li, Y. Peng, and X. Lu. VMThunder: Fast Provisioning of Large-Scale Virtual Machine Clusters. IEEE Transactions on Parallel and Distributed Systems, 99, 2014.Google ScholarGoogle Scholar
  43. M. Zhao, J. Zhang, and R. Figueiredo. Distributed File System Support for Virtual Machines in Grid Computing. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC '04, pages 202--211, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang, and K. Li. Liquid: A Scalable Deduplication File System for Virtual Machine Images. IEEE Transaction on Parallel and Distributed Systems, 2013.Google ScholarGoogle Scholar

Index Terms

  1. Squirrel: scatter hoarding VM image contents on IaaS compute nodes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
        June 2014
        334 pages
        ISBN:9781450327497
        DOI:10.1145/2600212

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        HPDC '14 Paper Acceptance Rate21of130submissions,16%Overall Acceptance Rate166of966submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader