Skip to main content
Log in

Hash-based labeling techniques for storage scaling

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Scalable storage architectures allow for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices. Assuming random placement of data objects across multiple storage devices of a storage pool, our optimization objective is to redistribute a minimum number of objects after scaling the pool. In addition, a uniform distribution, and hence a balanced load, should be ensured after redistribution. Moreover, the redistributed objects should be retrieved efficiently during the normal mode of operation: in one I/O access and with low complexity computation. To achieve this, we propose an algorithm called random disk labeling (RDL), based on double hashing, where storage can be added or removed without any increase in complexity. We compare RDL with other proposed techniques and demonstrate its effectiveness through experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berson S, Muntz R, Wong WR (1996) Randomized data allocation for real-time disk I/O. In: Proceedings of COMPCON, Santa Clara, CA, 25-28 February 1996, pp 286-290

  2. Boral H, Alexander W, Clay L, Copeland G, Danforth S, Franklin M, Hart B, Smith M, Valduriez P (1990) Prototyping Bubba, a highly parallel database system. IEEE Trans Knowl Data Eng 2(1):4-24

    Article  Google Scholar 

  3. Byers J, Considine J, Mitzenmacher M (2003) Simple load balancing for distributed hash tables. In: Proceedings of the 2nd international workshop on peer-to-peer systems (IPTPS ‘03), Berkeley, CA, February 2003

  4. DeWitt D, Ghandeharizadeh S, Schneider D, Bricker A, Hsiao H-I, Rasmussen R (1990) The Gamma Database Machine project. IEEE Trans Knowl Data Eng 2(1):44-62

    Article  Google Scholar 

  5. DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85-98

    Google Scholar 

  6. Ghandeharizadeh S, Kim D (1996) On-line reorganization of data in scalable continuous media servers. In: Proceedings of the 7th international conference and workshop on database and expert systems applications (DEXA’96), Zurich, Switzerland, September 1996

  7. Ghandeharizadeh S, Kim SH (1995) Striping in multi-disk video servers. In: Proceedings of the SPIE conference on high-density data recording and retrieval technologies, October 1995, pp 88-102

  8. Goel A, Shahabi C, Yao S-YD, Zimmermann R (2002) SCADDAR: An efficient randomized technique to reorganize continuous media blocks. In: Proceedings of the 18th international conference on data engineering, San Jose, CA, February 2002, pp 473-482

  9. Karger D, Lehman E, Leighton T, Levine M, Lewin D, Panigrahy R (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the 29th annual ACM symposium on theory of computing (STOC), Montreal, May 1997, pp 654-663

  10. Knuth DE (1998) The art of computer programming, vol 3. Addison-Wesley, Reading MA

  11. Larson P-Å(1988) Dynamic hash tables. Commun ACM 31(4)

  12. Litwin W, Neimat M-A, Schneider DA (1996) LH* - A scalable, distributed data structure. ACM Trans Database Sys 21(4):480-525

    Google Scholar 

  13. Martin C, Narayan PS, Özden B, Rastogi R, Silberschatz A (1996) The Fellini multimedia storage server. In: Chung SM (ed) Multimedia information storage and management, chap 5. Kluwer, Boston, August 1996

  14. Mehta M, DeWitt DJ (1997) Data placement in shared-nothing parallel database systems. VLDB J 6(1):53-72

    Google Scholar 

  15. Morris R (1968) Scatter storage techniques. Commun ACM 11(1):38-44

    Google Scholar 

  16. Muntz R, Santos J, Berson S (1997) RIO: A real-time multimedia object server. In: ACM Sigmetr Perform Eval Rev 25: 29-35

  17. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content-addressable network. In: Proceedings of ACM SIGCOMM ‘01, August 2001, pp 161-172

  18. Rice JA (1995) Mathematical statistics and data analysis. Duxbury Press, Pacific Grove, CA

  19. Rowstron A, Druschel P (2001) Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of the 18th IFIP/ACM international conference on distributed systems platforms (Middleware 2001), November 2001, pp 329-350

  20. Santos JR, Muntz RR (1998) Performance analysis of the RIO multimedia storage system with heterogeneous disk configurations. In: Proceedings of the ACM multimedia conference, Bristol, UK

  21. Santos JR, Muntz RR, Ribeiro-Neto B (2000) Comparing random data allocation and data striping in multimedia servers. In: Proceedings of SIGMETRICS, Santa Clara, CA, 17-21 June 2000

  22. Shahabi C, Zimmermann R, Fu K, Yao S-YD (2002) Yima: a second generation continuous media server. IEEE Comput 35:56-64

    Google Scholar 

  23. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of the 2001 ACM SIGCOMM conference, May 2001, pp 149-160

  24. Teradata Corp (1985) DBC/1012 Data Base Computer System manual. Teradata Corp. Document No. C10-0001-02, Release 2.0, November

  25. Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. IEEE/ACM Trans Network 6(1):1-14

  26. Xin Q, Miller EL, Schwarz T, Long DDE, Brandt SA, Litwin W (2003) Reliability mechanisms for very large storage systems. In: Proceedings of the 20th IEEE/11th NASA Goddard conference on mass storage systems and technologies (MSS’03), April 2003, pp 146-156

  27. Zhao BY, Kubiatowicz J, Joseph AD (2001) Tapestry: an infrastructure for fault-tolerant wide-area location and routing. Technical report UCB/CSD-01-1141, UC Berkeley, California, April 2001

  28. Zimmermann R, Ghandeharizadeh S (1997) Continuous display using heterogeneous disk-subsystems. In: Proceedings of the 5th ACM multimedia conference, Seattle, 9-13 November 1997, pp 227-236

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu-Yuen D. Yao.

Additional information

Received: 23 June 2003, Accepted: 16 February 2004, Published online: 23 June 2004

Edited by: G. Alonso

This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC), IIS-0082826 (ITR), IIS-0238560 (CAREER), IIS-0324955 (ITR), and IIS-0307908 and unrestricted cash gifts from Okawa Foundation and Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, SY.D., Shahabi, C. & Larson, PÅ. Hash-based labeling techniques for storage scaling. The VLDB Journal 14, 222–237 (2005). https://doi.org/10.1007/s00778-004-0124-6

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0124-6

Keywords:

Navigation