Skip to main content
Log in

Snowball: Scalable Storage on Networks of Workstations with Balanced Load

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Networks of workstations are an emerging architectural paradigm for high-performance parallel and distributed systems. Exploiting networks of workstations for massive data management poses exciting challenges. We consider here the problem of managing record-structured data in such an environment. For example, managing collections of HTML documents on a cluster of WWW servers is an important application for which our approach provides support. The records are accessed by a dynamically growing set of clients based on a search key (e.g., a URL). To scale up the throughput of client accesses with approximately constant response time, the records and thus also their access load are dynamically redistributed across a growing set of workstations. The paper addresses two problems of realistic workloads: skewed access frequencies to the records and evolving access patterns where previously cold records may become hot and vice versa. Our solution incorporates load tracking at different levels of granularity and automatically chooses the appropriate granularity for dynamic data migrations. Experimental results based on a detailed simulation model show that our method is indeed successful in providing scalable cost/performance and explicitly controlling its level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. Abbott and H. Garcia-Molina, “Scheduling real-time transactions: A performance evaluation,” ACM Transactions on Database Systems, vol. 13, no. 3, 1992.

  2. T.E. Anderson, D.E. Culler, and D.A. Patterson, “The NOW team: A case for NOW (Networks of Workstations),” IEEE Micro, vol. 15, no. 1, 1995.

  3. A. Barak and A. Shiloh, “A distributed load balancing policy for a multicomputer,” Software Practice and Experience, vol. 15, no. 9, pp. 901–913, Sept. 1985.

    Google Scholar 

  4. T. Barclay, R. Barnes, J. Gray, and P. Sundaresan, “Loading databases using dataflow parallelism,” Sigmod Record, vol. 23, no. 4, 1994.

  5. B. Bergsten, M. Couprie, and P. Valduriez, “Overview of parallel architectures for databases,” The Computer Journal, vol. 36, no. 8, 1993.

  6. A. Bestavros, “Demand-based document dissemination to reduce traffic and balance load in distributed information systems,” in Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, 1995

  7. Y. Breitbart, R. Vingralek, and G. Weikum, “Load control in scalable distributed file structures,” Distributed and Parallel Databases, vol. 4, no. 4, 1996.

  8. S. Christodoulakis and L. Koveos, “Multimedia information systems: Issues and approaches,” in Modern Database Systems, W. Kim (Ed.), ACM Press, 1995.

  9. E. Coffman and P. Denning, Operating Systems Theory, Prentice-Hall, 1973.

  10. G. Copeland, W. Alexander, E. Bougher, and T. Keller, “Data placement in bubba,” ACM SIGMOD Conference, 1988.

  11. P.F. Corbett, D.G. Feitelson, J.P. Prost, and S.J. Baylor, “Parallel access to files in the vesta file system,” Supercomputing'93, 1993.

  12. P.E. Crandall, R.A. Aydt, A.A. Chien, and D.A. Reed, Input/Output Characteristics of Scalable Parallel Applications, available at http://www-pablo.cs.uiuc.edu.

  13. CSIM17 User's Guide, Mesquite Software Inc., Austin, 1994.

  14. M.D. Dahlin, R.Y. Wang, T.E. Anderson, and D.A. Patterson, “Cooperative caching: Using remote client memory to improve file system performance,” in USENIX Symposium on Operating System Design and Implementation, Monterey, 1994.

  15. R. Devine, “Design and implementation of DDH: A distributed dynamic hashing algorithm,” in 4th International Conference on Foundations of Data Organization and Algorithms (FODO), Chicago, 1993.

  16. D.J. DeWitt and J.N. Gray, “Parallel database systems: The future of high performance database systems,” Communications of the ACM, vol. 35, no. 6, pp. 85–98, June 1992.

    Google Scholar 

  17. D.J. DeWitt, J.F. Naughton, D.A. Schneider, and S. Seshadri, “Practical skew handling in parallel joins,” in 18th International Conference on Very Large Data Bases, Vancouver, 1992.

  18. D.L. Eager, E.D. Lazowska, and J. Zahorjan, “Adaptive load sharing in homogeneous distributed systems,” IEEE Transactions on Software Engineering, vol. 12, no. 5, pp. 662–675, May 1986.

    Google Scholar 

  19. M.J. Franklin, M.J. Carey, and M. Livny, “Global memory management in client-server DBMS architectures,” in 18th International Conference on Very Large Data Bases, Vancouver, 1992.

  20. M.R. Garey and D.S. Johnson, Computers and Intractability, Freeman and Co., 1979.

  21. J. Gemmel and S. Christodoulakis, “Principles of storage and retrieval for delay sensitive data,” ACM Transactions on Information Systems, 1992.

  22. R.L. Graham, “Bounds on multiprocessing timing anomalies,” SIAM Journal on Applied Mathematics, vol. 17, no. 2, pp. 416–429, 1969.

    Google Scholar 

  23. J. Gray (Ed.), The Benchmark Handbook for Database and Transaction Processing Systems, Morgan Kaufmann, 1991.

  24. J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, 1993.

  25. J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, 1990.

  26. D. Hong, T. Johnson, and S. Chakravarthy, “Real-time transaction scheduling: A cost conscious approach,” SIGMOD Conference, 1993.

  27. J.H. Howard, M.L. Kazar, S.G. Menees, D.A. Nichols, M. Satyanarayanan, and R.N. Sidebotham, “Scale and performance in distributed file system,” ACM Transactions on Computer Systems, vol. 6, no. 1, 1988.

  28. K.A. Hua and C. Lee, “Handling data skew in multiprocessor database computers using partition tuning,” International Conference on Very Large Data Bases (VLDB), Barcelona, 1991.

  29. K.A. Hua, C. Lee, and H.C. Young, “Data partitioning for multicomputer database systems: A cell-based approach,” Information Systems,s vol. 18, no. 5, pp. 329–342, 1993.

    Google Scholar 

  30. T. Johnson and P. Krishna, “Lazy updates for distributed search structure,” ACM SIGMOD Conference, Washington, 1993.

  31. T. Johnson, P. Krishna, and A. Colbrook, “Distributed indices for accessing distributed data,” in Proceedings of Twelfth IEEE Symposium on Mass Storage Systems, Monterey, 1993.

  32. M. Kitsuregawa and Y. Ogawa, “Bucket spreading parallel hash: A new, robust, parallel hash join method for skew in the super database computer (SDC),” in International Conference on Very Large Data Bases (VLDB), Brisbane, 1990

  33. L. Kleinrock, Queueing Systems, John Wiley, 1975.

  34. D. Knuth, The Art of Computer Programming, Addison-Wesley, 1973.

  35. B. Kroll and P. Widmayer, “Distributing a search tree among a growing number of processors,” ACMSIGMOD Conference, Minneapolis, 1994.

  36. J.F. Kurose, “Open issues and challenges in providing quality of service guarantees in high-speed networks,” Computer Communication Review, vol. 23, no. 1, 1993.

  37. T.T. Kwan, R.E. McGrath, and D.A. Reed, User Access Patters to NCSA's World Wide Web Server, available at http://www-pablo.cs.uiuc.edu.

  38. W. Litwin, M.-A. Neimat, and D.A. Schneider, “LH*-Linear hashing for distributed files,” ACM SIGMOD Conference, Washington, 1993; extended version published as: Technical Report HPL–93–21, Hewlett-Packard Labs, 1993.

  39. W. Litwin, M.-A. Neimat, and D.A. Schneider, “RP*: A family of order-preserving scalable distributed data structures,” VLDB Conference, Santiago de Chile, 1994.

  40. M.J. Litzkow, M. Livny, and M.W. Mutka, “Condor-A hunter of idle workstations,” in 8th International Conference on Distributed Computing Systems (DCS), San Jose, 1988.

  41. J.K. Ousterhout, A.R. Cherenson, F. Douglis, M.N. Nelson, and B.B. Welch, “The sprite network operating system,” IEEE Computer, vol. 21, no. 2, 1988.

  42. C. Partridge, Gigabit Networking, Addison-Wesley, 1994.

  43. E. Rahm, “Parallel query processing in shared disk database systems,” Technical report, University of Kaiserslautern, 1993.

  44. R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyone, “Design and implementation of the Sun network file system,” Usenix 1985 Summer Conference, 1985.

  45. P. Scheuermann, G. Weikum, and P. Zabback, “Adaptive load balancing in disk arrays,” Int. Conf. on Foundations of Data Organization and Algorithms, Chicago, 1993.

  46. P. Scheuermann, G. Weikum, and P. Zabback, “Disk cooling in parallel disk systems,” IEEE Data Engineering Bulletin, vol. 17, no. 3, pp. 29–40, Sept. 1994.

    Google Scholar 

  47. C. Severance, S. Pramanik, and P. Wolberg, “Distributed linear hashing and parallel projection in main memory databases,” VLDB Conference, Brisbane, 1990.

  48. Sun Microsystems, Inc., SPARCstation Desktop Product Line Overview, available at http://www.sun.com, 1995.

  49. M.M. Theimer, K.A. Lantz, and D.R. Cheriton, “Preemptable remote execution facilities for the V-System,” in Proceedings of the 10th ACM Symposium on Operating Systems Principles, 1985.

  50. R. Vingralek, Y. Breitbart, and G. Weikum, “Distributed file organization with scalable cost/performance,” ACM SIGMOD Conference, Minneapolis, 1994.

  51. R. Vingralek, Y. Breitbart, and G. Weikum, “Cost/performance control in SNOWBALL distributed file manager,” Fifteenth Database Conference DataSem'95, Brno, 1995.

  52. G. Weikum, P. Zabback, and P. Scheuermann, “Dynamic file allocation in disk arrays,” ACM SIGMOD Conference, Denver, 1991.

  53. J.L. Wolf, P.S. Yu, J. Turek, and D.M. Dias, “A parallel hash join algorithm for managing data skew,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 12, pp. 1355–1371, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vingralek, R., Breitbart, Y. & Weikum, G. Snowball: Scalable Storage on Networks of Workstations with Balanced Load. Distributed and Parallel Databases 6, 117–156 (1998). https://doi.org/10.1023/A:1008609030195

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008609030195

Navigation