Skip to main content
Log in

Dynamic Data Reallocation for Skew Management in Shared-Nothing Parallel Databases

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The shared nothing parallel database architecture is gaining wide popularity due to its scalability and increased data availability. However, in order to efficiently utilize parallelism in such architecture, independent data sets must be assigned to different processing nodes. This, of course, can initially be achieved by employing a careful partitioning scheme that allocates disjoint data sets to different processors. However, variations in the data access pattern may render some processors overloaded while others underloaded. This skewness in data access decreases the effective parallelism and eventually leads to overall performance degradation. A number of solutions have been proposed to periodically perform data re-allocation to remove the skewness in data access. Most of the proposed solutions perform either static re-allocation that requires the system to be taken off-line or dynamic, but non-transactional, re-allocation. In this paper, we introduce a dynamic and transactional re-allocation scheme based on the work on disk cooling in shared memory architecture by Scheuermann et al. The proposed scheme enhances the effective parallelism in the system regardless of the variations in the pattern of access. The proposed scheme detects access skew as it occurs and re-allocates data partitions to underloaded processing elements on the fly. Only the block being moved becomes unavailable. In addition, mutual consistency among transactions concurrent to the re-allocation event is preserved. The proposed scheme also uses replication as an additional cooling mechanism to help distribute access load over multiple replicas. We conducted a series of simulation experiments to study the behavior of shared nothing parallel database systems with and without the proposed dynamic re-allocation scheme. We also experimented with several replication strategies to measure their impact on the system performance. Finally, we studied the effect of using different concurrency control strategies on the efficiency of dynamic re-allocation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Alexander et al., "Process and dataflow control in distributed data-intensive systems," in Proceedings of ACM SIGMOD Conference, June 1988.

  2. Liz Chambers and Dave Cracknell, "Parallel features of nonstop SQL," in Second International Conference on Parallel and Distributed Systems, pp. 69–70, 1993.

  3. G. Copeland and T. Keller, "A comparison of high-availability media recovery technique," in Proc. of ACM SIGMOD Conference, June 1989.

  4. David DeWitt et al., "GAMMA-A high performance dataflow database machine," in Proceedings of the 1986 VLDB Conference, Aug. 1986.

  5. David DeWitt et al., "The gamma database machine project," IEEE Knowledge Data Engineering, vol. 2, no. 1, March 1990.

  6. David DeWitt and Jim Gray, "Parallel database systems: The future of high performance database systems," Communications of ACM, vol. 35, no.6, pp. 85–98, June 1992.

    Google Scholar 

  7. C. Faloutsos and P. Bhagwat, "Declustering using fractals," in First International Conference on Parallel and Distributed Information Systems, 1993, pp. 18–28.

  8. P.A. Franaszek, J.T. Robinson, and A. Thomasian, "Concurrency control for high contention environment," ACM Transactions on Database Systems, vol. 17, no. 2, pp. 304–345, June 1992.

    Google Scholar 

  9. Abdelsalam Helal, Ku Tunghui, and Ramez Elmasri, "Comparative performance analysis of exclusive and balanced concurrency control algorithms," in International Conference on Parallel and Distributed Systems, Taipei, Taiwan, Dec. 1993, pp. 227–234.

  10. Abdelsalam Helal, Ku Tunghui, and Jud Fortner, "Quasi-dynamic two-phase locking," in The International Conference on Information and Knowledge Management, Gaithersburg, Maryland, Nov. 1994, pp. 211–218.

  11. Abdelsalam Helal and Jud Fortner, "Achieving scalability in highly contentious database systems," Information Sciences, Elsevier Publishing, pp. 39–61, Feb. 1996.

  12. Abdelsalam Helal and Jud Fortner, "Scaletool: A GUI-based database scalability evaluation tool," in The IASTED International Conference on Modelling and Simulation, Pittsburgh, Pennsylvania, April 1996.

  13. Hui-I Hsiao and David Dewit, "A performance study of three availability data replication strategies," in First International Conference on Parallel and Distributed Systems, pp. 18–28, 1991.

  14. Peter Scheuermann, Gerhard Weikum, and Peter Zabback, "Disk cooling in parallel disk systems," Bulletin of the Technical Committee on Data Engineering, vol. 17, no. 3, pp. 29–40, Sept. 1994.

    Google Scholar 

  15. J.A. Solworth and C.U. Orji, "Distorted mirros," in First International Conference on Parallel and Distributed Information Systems, pp. 10–17, 1991.

  16. Patrick Valduriez, "Parallel database systems: Open problems and new issues," Distributed and Parallel Databases, vol. 1, pp. 137–165, 1992.

    Google Scholar 

  17. Patrick Valduriez, "Parallel database systems: The case for shared-something," IEEE, pp. 460–465, 1993.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Helal, A.(., Yuan, D. & EL-Rewini, H. Dynamic Data Reallocation for Skew Management in Shared-Nothing Parallel Databases. Distributed and Parallel Databases 5, 271–288 (1997). https://doi.org/10.1023/A:1008637328830

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008637328830

Navigation