skip to main content
10.1145/3225058.3225083acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud

Published:13 August 2018Publication History

ABSTRACT

Object storage clouds (e.g., Amazon S3) have become extremely popular due to their highly usable interface and cost-effectiveness. They are, therefore, widely used by various applications (e.g., Dropbox) to host user data. However, because object storage clouds are flat and lack the concept of a directory, it becomes necessary to maintain file meta-data and directory structure in a separate index cloud. This paper investigates the possibility of using a single object storage cloud to efficiently host the whole filesystem for users, including both the file content and directories, while avoiding meta-data loss caused by index cloud failures. We design a novel data structure, Hierarchical Hash (or H2), to natively enable the efficient mapping from filesystem operations to object-level operations. Based on H2, we implement a prototype system, H2Cloud, that can maintain large filesystems of users in an object storage cloud and support fast directory operations. Both theoretical analysis and real-world experiments confirm the efficacy of our solution: H2Cloud achieves faster directory operations than OpenStack Swift by orders of magnitude, and has similar performance to Dropbox but yet does not need a separate index cloud.

References

  1. Aliyun Object Storage Service 2018. (2018). https://intl.aliyun.com/product/oss.Google ScholarGoogle Scholar
  2. Amazon S3 (Simple Storage Service) 2018. (2018). http://aws.amazon.com/s3.Google ScholarGoogle Scholar
  3. Alysson Neves Bessani, Ricardo Mendes, Tiago Oliveira, Nuno Ferreira Neves, Miguel Correia, Marcelo Pasin, and Paulo Verissimo. 2014. SCFS: A Shared Cloud-backed File System. In Proc. of ATC. USENIX, 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Scott A Brandt, Ethan L Miller, Darrell DE Long, and Lan Xue. 2003. Efficient Metadata Management in Large Distributed Storage Systems. In Proc. of MSST. IEEE, 290--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Building a Consistent Hashing Ring (for OpenStack Swift) 2018. (2018). http://docs.openstack.org/developer/swift/ring_background.html.Google ScholarGoogle Scholar
  6. Camlistore 2018. (2018). https://camlistore.org.Google ScholarGoogle Scholar
  7. Thierry Titcheu Chekam, Ennan Zhai, Zhenhua Li, Yong Cui, and Kui Ren. 2016. On the Synchronization Bottleneck of OpenStack Swift-like Cloud Storage Systems. In Proc. of INFOCOM. IEEE, 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall. and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, and Doug Terry. 1987. Epidemic Algorithms for Replicated Database Maintenance. In Proc. of PODC. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Idilio Drago, Marco Mellia, Maurizio M Munafo, Anna Sperotto, Ramin Sadre, and Aiko Pras. 2012. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proc. of IMC. ACM, 481--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dropbox confirms that a bug within Selective Sync may have caused data loss 2014. (2014). https://news.ycombinator.com/item?id=8440985.Google ScholarGoogle Scholar
  12. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In ACM SIGOPS Operating Systems Review, Vol. 37. ACM, 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. How a bug in Dropbox permanently deleted my 8000 photos 2014. (2014). https://news.ycombinator.com/item?id=8101579.Google ScholarGoogle Scholar
  14. John Howard, Michael Kazar, Sherri Menees, et al. 1988. Scale and Performance in a Distributed File System. ACM Transactions on Computer Systems (TOCS) 6, 1 (1988), 51--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John H Howard et al. 1988. An Overview of the Andrew File System. Carnegie Mellon University, Information Technology Center.Google ScholarGoogle Scholar
  16. Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, and Lei Tian. 2009. SmartStore: Anew Metadata Organization Paradigm with Semantic-Awareness for Next-Generation File Systems. In Proc. of SC. ACM, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, and Lei Tian. 2012. Semantic-aware Metadata Organization Paradigm in Next-generation File Systems. IEEE Transactions on Parallel and Distributed Systems 23, 2 (2012), 337--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Felix Hupfeld, Toni Cortes, Björn Kolbeck, Jan Stender, Erich Focht, Matthias Hess, Jesus Malo, Jonathan Marti, and Eugenio Cesario. 2008. The XtreemFS Architecture-a Case for Object-based File Systems in Grids. Concurrency and computation: Practice and experience 20, 17 (2008), 2049--2060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Inside the Magic Pocket 2018. (2018). http://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket.Google ScholarGoogle Scholar
  20. David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In Proc. of STOC. ACM, 654--663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Avinash Lakshman and Prashant Malik. 2010. Cassandra: a Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Leslie Lamport. 2001. Paxos Made Simple. ACM SIGACT News 32, 4 (2001), 18--25.Google ScholarGoogle Scholar
  23. Paul J Leach, Michael Mealling, and Rich Salz. 2005. A Universally Unique Identifier (UUID) URN Namespace. (2005).Google ScholarGoogle Scholar
  24. Zhenhua Li, Cheng Jin, Tianyin Xu, et al. 2014. Towards Network-level Efficiency for Cloud Storage Services. In Proc. of IMC. ACM, 115--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhenhua Li, Christo Wilson, Zhefu Jiang, Yao Liu, Ben Y Zhao, Cheng Jin, Zhi-Li Zhang, and Yafei Dai. 2013. Efficient batched synchronization in dropbox-like cloud storage services. In Proc. of Middleware. Springer, 307--327.Google ScholarGoogle Scholar
  26. Jinjun Liu, Dan Feng, Yu Hua, Bin Peng, and Zhenhua Nie. 2015. Using Provenance to Efficiently Improve Metadata Searching Performance in Storage systems. Future Generation Computer Systems 50 (2015), 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jake Luciani. 2012. Cassandra File System Design. DATATAX Blog {online} http://www.datastax.com/dev/blog/cassandra-file-system-design (2012).Google ScholarGoogle Scholar
  28. Micheal Moore, David Bonnie, Becky Ligon, Mike Marshall, Walt Ligon, Nicholas Mills, Elaine Quarles, Sam Sampson, Shuangyang Yang, and Boyd Wilson. 2011. OrangeFS: Advancing PVFS. In Proc. of FAST poster. USENIX.Google ScholarGoogle Scholar
  29. Subramanian Muralidhar et al. 2014. f4: Facebook's Warm BLOB Storage System. In Proc. of OSDI. USENIX Association, 383--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Salman Niazi, Mahmoud Ismail, Seif Haridi, Jim Dowling, Steffen Grohsschmiedt, and Mikael Ronström. 2017. HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In Proc. of FAST. USENIX, 89--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Fatma Özcan, Nesime Tatbul, Daniel J Abadi, Marcel Kornacker, C Mohan, Karthik Ramasamy, and Janet Wiener. 2014. Are We Experiencing a Big Data Bubble?. In Proc. of SIGMOD. ACM, 1407--1408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leandro Pacheco, Raluca Halalai, Valerio Schiavoni, Fernando Pedone, Etienne Riviere, and Pascal Felber. 2016. GlobalFS: A Strongly Consistent Multi-site File System. In Proc. of SRDS. IEEE, 147--156.Google ScholarGoogle ScholarCross RefCross Ref
  33. Swapnil Patil and Garth A Gibson. 2011. Scale and Concurrency of GIGA+: File System Directories with Millions of Files. In Proc. of FAST. USENIX, 13--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel, and Dave Hitz. 1994. NFS Version 3: Design and Implementation. In USENIX Summer. Boston, MA, 137--152.Google ScholarGoogle Scholar
  35. T. S. Pillai et al. 2014. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In Proc. of OSDI. 433--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gerald Popek and Bruce J Walker. 1985. The LOCUS Distributed System Architecture. The MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sean Quinlan and Sean Dorward. 2002. Venti: A New Approach to Archival Storage. In Proc. of FAST. 89--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. 2001. A Scalable Content-Addressable Network. In Proc. of SIGCOMM. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sean Rhea, Russ Cox, and Alex Pesterev. 2008. Fast, Inexpensive Content-Addressed Storage in Foundation. In Proc. of ATC. USENIX Association, 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mahadev Satyanarayanan, James Kistler, and Kumarand others. 1990. Coda: A Highly Available File System for a Distributed Workstation Environment. IEEE Trans. Comput. 39, 4 (1990), 447--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Konstantin Shvachko and Yuxiang Chen. 2017. Scaling Namespace Operations with Giraffa File System. USENIX;log in: 42, 2 (2017), 27--30.Google ScholarGoogle Scholar
  42. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proc. of MSST. IEEE, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mandayam C Srivas et al. 2017. Map-Reduce Ready Distributed File System. (2017). US Patent App. 15/668,666.Google ScholarGoogle Scholar
  44. Michael Stonebraker. 2012. NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps. Commun. ACM (2012), 07--06.Google ScholarGoogle Scholar
  45. Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS File System. In Proc. of ATC. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. The Open Group Base Specifications Issue 7-IEEE Std 1003.1 2018. (2018). http://pubs.opengroup.org/onlinepubs/9699919799/.Google ScholarGoogle Scholar
  47. Alexander Thomson and Daniel J Abadi. 2015. CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems. In Proc. of FAST. USENIX, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Niraj Tolia, Michael Kozuch, Mahadev Satyanarayanan, Brad Karp, Thomas Bressoud, and Adrian Perrig. 2003. Opportunistic Use of Content Addressable Storage for Distributed File Systems. In Proc. of ATC. 127--140.Google ScholarGoogle Scholar
  49. Michael Vrable, Stefan Savage, and Geoffrey M Voelker. 2009. Cumulus: Filesystem Backup to the Cloud. ACM Transactions on Storage (TOS) 5, 4 (2009), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Michael Vrable, Stefan Savage, and Geoffrey M Voelker. 2012. Bluesky: A Cloud-backed File System for the Enterprise. In Proc. of FAST. USENIX, 19--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. H. Wang, R. Shea, F. Wang, and J. Liu. 2012. On the Impact of Virtualization on Dropbox-like Cloud File Storage/Synchronization Services. In Proc. of IWQoS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-Performance Distributed File System. In Proc. of OSDI. USENIX Association, 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sage A Weil, Kristal T Pollack, Scott A Brandt, and Ethan L Miller. 2004. Dynamic Metadata Management for Petabyte-Scale File Systems. In Proc. of SC. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Brent Welch, Marc Unangst, Zainul Abbasi, Garth A Gibson, Brian Mueller, Jason Small, Jim Zelenka, and Bin Zhou. 2008. Scalable Performance of the Panasas Parallel File System. In Proc. of FAST. USENIX, 17--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Why Dropbox decided to drop AWS and build its own infrastructure and network 2017. (2017). https://techcrunch.com/2017/09/15/why-dropbox-decided-to-dropaws-and-build-its-own-infrastructure-and-network.Google ScholarGoogle Scholar
  56. Y. Yu, D. Belazzougui, C. Qian, and Q. Zhang. 2018. Memory-efficient and Ultrafast Network Lookup and Forwarding using Othello Hashing. IEEE/ACM Transactions on Networking (2018).Google ScholarGoogle Scholar
  57. Yupu Zhang, Chris Dragga, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2014. ViewBox: Integrating Local File Systems with Cloud Storage Services. In Proc. of FAST. USENIX, 119--132. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
          August 2018
          945 pages
          ISBN:9781450365109
          DOI:10.1145/3225058

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          ICPP '18 Paper Acceptance Rate91of313submissions,29%Overall Acceptance Rate91of313submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader