skip to main content
10.1145/1048935.1050203acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

An Efficient Data Location Protocol for Self.organizing Storage Clusters

Published:15 November 2003Publication History

ABSTRACT

Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of storage nodes. We further present an efficient location scheme that differentiates between small and large file blocks for reduced management overhead compared to uniform strategies. In our protocol, small blocks, which are typically in large quantities, are placed through consistent hashing. Large blocks, much fewer in practice, are placed through a usage-based policy, and their locations are tracked by Bloom filters. The proposed scheme results in improved storage utilization even with non-uniform cluster nodes. To achieve high scalability and fault resilience, this protocol is fully distributed, relies only on soft states, and supports data replication. We demonstrate the effectiveness and efficiency of this protocol through trace-driven simulation.

References

  1. {1} Ask Jeeves, Inc. URL http://www.ask.com/.Google ScholarGoogle Scholar
  2. {2} CXFS: A high-performance, multi-OS SAN file system from SGI. SGI White Paper. URL http://www.sgi.com/products/storage/ cxfs.html.Google ScholarGoogle Scholar
  3. {3} NFS: Network File System version 3 protocol specification. Technical Report SUN Microsystems, 1994.Google ScholarGoogle Scholar
  4. {4} D. Anderson, J. Chase, and A. Vahdat. Interposed request routing for scalable network storage. In Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 00), October 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli, and R. Wang. Serverless network file systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP 95), December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} M. Baker, J. Hartman, M. Kupfer, K. Shirriff, and J. Ousterhout. Measurements of a distributed file system. In Proceedings of the 13th ACM symposium on Operating systems principles (SOSP 91), pages 198-212, Pacific Grove, CA, 1991. ACM Press. ISBN 0-89791-447-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the Association for Computing Machinery, 13(7): 422-426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} S. A. Brandt, L. Xue, E. L. Miller, and D. D. E. Long. Efficient metadata management in large distributed file systems. In Proceedings of the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies, pages 290-298, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} A. Brinkmann, K. Salzwedel, and C. Scheideler. Compact, adaptive placement schemes for non-uniform requirements. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 02), pages 53-62, Winnipeg, Manitoba, Canada, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317-327, Atlanta, GA, 2000. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. Wallach. Security for structured peer-to-peer overlay networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 02), Boston, MA, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz. Titan: a high-performance remote-sensing database. In Proceedings of the 13th International Conference on Data Engineering (ICDE 97), Birmingham, U.K., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} J. Chase, D. Anderson, P. Thakur, and A. Vahdat. Managing energy and server resources in hosting centers. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 01), October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In Proceedings of SuperComputing, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} P. F. Corbett, D. G. Feltelson, J-P. Prost, G. S. Almasi, S. J. Baylor, A. S. Bolmarcich, Y. Hsu, J. Satran, M. Snir, R. Colao, B. D. Herr, J. Kavaky, T. R. Morgan, and A. Ziotek. Parallel file systems for the IBM SP computers. IBM Systems Journal, 34(2): 222-248, 1995. ISSN 0018-8670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} F. Dabek, F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 01), Chateau Lake Louise, Banff, Canada, Octorber 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} A. Demers, D. Greene, C. Hauser, W. Irish, and J. Larson. Epidemic algorithms for replicated database maintenance. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (PODC 87), pages 1-12. ACM Press, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} L. Fan, P. Cao, J. Almeida, and A. Broder. Summary Cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3): 281-293, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} J. Hartman, I. Murdock, and T. Spalink. The Swarm scalable storage system. In Proceedings of International Conference on Distributed Computing Systems, pages 74-81, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} J. Hartman and J. Ousterhout. The Zebra striped network file system. ACM Transactions on Computer Systems (TOCS), 13 (3): 274-310, 1995. ISSN 0734-2071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao. Distributed object location in a dynamic network. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA 02), pages 41-52, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} R. J. Honicky and E. L. Miller. A fast algorithm for online placement and reorganization of replicated data. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 03), Nice, France, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} D. Karger, E. Lehman, T. Leighton, M. Levine, D. Levin, and R. Panigraphy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of ACM Symposium on Theory of Computing (STOC 97), pages 654-663, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP 91), volume 25, pages 213-225. ACM Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 00). ACM, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} E. Lee and C. Thekkath. Petal: Distributed virtual disks. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 96), pages 84-92, Cambridge, MA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} E. Lee, C. Thekkath, C. Whitaker, and J. Hogg. A Comparison of Two Distributed Disk Systems. Technical Report 155, Compaq (DEC) System Research Center, April 1998.Google ScholarGoogle Scholar
  28. {28} W. Litwin, M-A. Neimat, and D. Schneider. LH* -- Linear Hashing for distributed files. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of data, pages 327-336, Washington, DC, 1993. ACM Press. ISBN 0-89791-592-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {29} W. Litwin, M-A. Neimat, and D. Schneider. LH* -- A scalable, distributed data structure. ACM Transactions on Database Systems (TODS), 21(4): 480-525, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} W. Litwin and T. Schwarz. LH*RS : A high-availability scalable distributed data structure using reed solomon codes. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, pages 237-248, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} T. Liu and M. Martonosi. Impala: A middleware system for managing autonomic parallel sensor systems. In Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 03), San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {32} S. Mullender and A. Tanenbaum. A distributed file service based on optimistic concurrency control. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (SOSP 85), pages 51-62, Orcas Island, WA, 1985. ACM Press. ISBN 0-89791-174-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {33} J. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch. The Sprite network operating system. IEEE Computer Magazine, 21(2), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {34} D. Patterson, K. Asanovic, A. Brown, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, C. Kozyrakis, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft, and K. Yelick. Intelligent RAM (IRAM): The industrial setting, applications, and architectures. In Proceedings of the International Conference on Computer Design (ISCA 97), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {35} S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A scalable content-addressable network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 01), pages 161-172, San Diego, CA, August 2001. ACM Press. ISBN 1-58113-411-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {36} S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: The OceanStore prototype. In Proceedings of the 2nd Conference on File and Storage Technologies (FAST 03), pages 59-72, San Francisco, CA, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. {37} F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies (FAST 02), Monterey, CA, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {38} K. Shen, T. Yang, L. Chu, J. L. Holliday, D. A. Kuschner, and H. Zhu. Neptune: Scalable replication management and programming support for cluster-based network services. In Proceedings of the 3rd USENIX Symposium on Internet Technologies and Systems (USITS 01), pages 197-208, San Francisco, CA, March 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {39} I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 01), pages 149-160, San Diego, CA, August 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. {40} C. Thekkath, T. Mann, and E. Lee. Frangipani: A scalable distributed file system. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP 97), pages 224-237, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. {41} W. Vogels. File system usage in Windows NT 4.0. In Proceedings of the 17th ACM symposium on Operating systems principles (SOSP 99), pages 93-109, Charleston, SC, 1999. ACM Press. ISBN 1-58113-140-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {42} J. Waxman and J. McArthur. Storage area networking -- Opportunity for the indirect channel. IDC White Paper, 2000.Google ScholarGoogle Scholar
  43. {43} Z. Zhang and K. Ghose. yFS: A journaling file system design for handling large data sets with reduced seeking. In Proceedings of the 2nd Conference on File and Storage Technologies (FAST 03), San Francisco, CA, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing
    November 2003
    859 pages
    ISBN:1581136951
    DOI:10.1145/1048935

    Copyright © 2003 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 November 2003

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    SC '03 Paper Acceptance Rate60of207submissions,29%Overall Acceptance Rate1,516of6,373submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader