Skip to main content

Fast parallel algorithms for processing of joins

  • Session 10: Algorithms, Architectures And Performance III
  • Conference paper
  • First Online:
  • 139 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 297))

Abstract

We present and analyze here some innovative techniques for processing a join (or a semi-join) in a parallel computing environment. Our algorithms employ perfect hashing and, in some cases, copying of data in a group of processors, or filtering the data as they move through the network. By using the combinatorial properties of hashing we are able to prove almost optimal speedup, with high probability, when some uniformity assumptions hold for the data. Even in the absense of these assumptions our techniques achieve sub-optimal speedup and can be used as practical heuristics.

This research was supported in part by the NSF grant DCR 8503497 and by the Ministry of Industry, Energy and Technology of Greece.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Babb, "Implementing a Relational Database by Means of Specialized Hardware", ACM TODS 4,1 (March, 1979), 1–29.

    Google Scholar 

  2. P.A. Bernstein and D.W. Chiu, "Using Semijoins to Solve Relational Queries", J.ACM 28:1, pp. 25–40, 1981.

    Google Scholar 

  3. H. Boral and D.J. DeWitt, "Design consideration for data-flow database mechines", in proceedings of the ACM-SIGMOD conference on management of data, 1980, pp. 94–104.

    Google Scholar 

  4. H. Boral, D.J. DeWitt, D. Friedland, and W.K. Wilkinson, "Parallel Algorithms for the execution of relational database operations", ACM Transactions on Database Systems vol. 8, no. 3, September 1983, pp. 324–353.

    Google Scholar 

  5. C. Bouras, Y. Garofalakis, P. Spirakis and V. Trianatafillou "Queuing Delays in Buffered Multistage Interconnection Networks", 1987 ACM SIGMETRICS Conference, Perf. Evaluation Review, vol. 15 no. 1 pp. 111–122.

    Google Scholar 

  6. P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, and J.B. Rothnie Jr. "Query Processing in a System for Distributed Databases" (SDD-1). ACM Trans. Database Syst. 6,4 pp. 602–625, 1981.

    Google Scholar 

  7. B. Berra and E. Oliver 1979 "The role of associative array processors in database machine architectures" IEEE Computer, 12,3,53–61.

    Google Scholar 

  8. Carter J.L. and Wegman M.N. "Universal classes of hash functions", Proc. 9th Symposium on Theory of Computing, 1977, pp. 106–112.

    Google Scholar 

  9. S. Ceri and G. Pelagatti, "Allocation of Operations in Distributed Database Access" IEEE Trans. Comput. C-31,2, pp. 119–128.

    Google Scholar 

  10. W.W. Chu and P. Hurley, "Optimal Query Processing for Distributed Database Systems" IEEE Trans. Computing, C-31,9, pp. 835–850, 1982.

    Google Scholar 

  11. D.J. DeWitt, "DIRECT-a multiprocessor organization for supporting relational database management systems", IEEE Transactions on Computers, C-28,6, 1979.

    Google Scholar 

  12. R.S. Epstein and M. Stonebraker, "Analysis of query processing stategies for distributed database systems", sixth international conference on very large databases, Mondreal, October, 1980.

    Google Scholar 

  13. H.J. Forker, "Algebraical and operational methods for the optimization of query processing in distributed relational database management systems. In Proceedings of the 2nd International Symposium on Distributed Databases (Berlin, FRG). Elsevier North-Holland, New York, pp. 39–59.

    Google Scholar 

  14. Gonnet G.H., "Expected length of the longest probe sequence in hash code searching", JACM 28, 1981, 289–304.

    Google Scholar 

  15. L.R. Goke and G.J. Lipovsky, "Banyan networks for partitioning multiprocessor systems", in proceedings `st annual symposium on computer architecture, 1973, pp. 21–28.

    Google Scholar 

  16. J.R. Goodman and C.H. Sequin, "HYPERTREE: a multiprocessor interconnection topology", IEEE Transactions on Computing, 30,12, 1981.

    Google Scholar 

  17. B. Gavish and A. Segev, "Query Optimization in Distributed Computer Systems" In Management of Distributed Data Processing, J. Akoks, Ed. Elsevier North-Holland, New York, pp. 233–252, 1982.

    Google Scholar 

  18. N. Goodman and O. Shmueli, "Tree queries: A simple class of relational queries" ACM Transactions of Database Systems vol. 7, no. 4, December 1982, pp. 653–677.

    Google Scholar 

  19. A.R. Henver and S.-B. Yao, "Query Processing in Distributed Database systems", IEEE Trans. Softw. Eng. SE-5,3 pp. 177–187, 1979.

    Google Scholar 

  20. A.R. Henver and S.B. Yao, "Query Processing on a Distributed Database" proceedings Third Workshop on Distributed Data Management and Computer Networks, August 1978, pp. 91–107.

    Google Scholar 

  21. D.K. Hsiao, 1979 "Database Machines are Coming, Database Machines are Coming" IEEE Computer 12,3, pp. 7–9.

    Google Scholar 

  22. M. Jarke and J. Koch, "Query Optimization in Database Systems", ACM Computing Surveys, vol. 16, no. 2, June 1984, pp. 111–152.

    Google Scholar 

  23. C.P. Kruskal and M. Snir, "The Performance of multistage interconnection networks for multiprocessors", in IEEE transactions on computers, vol. C-32, no. 12, December 1983.

    Google Scholar 

  24. M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Architecture and Performance of Relational Algebra Machine Grace" IEEE Parallel Processing Conference 1984.

    Google Scholar 

  25. M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Grace: Relational Algebra Machine Based on Hash and Sort-Its Design Concepts" Journal of Information Processing, vol. 6, no. 3, 1983.

    Google Scholar 

  26. M.J. Menon and D.K. Hsiao, "Design and Analysis of a Relation Join Operation for VLSI", Proceedings International Conference on Very Large Database, 1981.

    Google Scholar 

  27. E.A. Ozkarahan 1982, RAP "Database Machine/Computer Based Distributed Databases", In Proceedings of the 2nd International Symposium on Distributed Databases. (Berlin, FRG). Elsevier North-Holland, New York, pp. 61–80.

    Google Scholar 

  28. J. Schwartz "Ultracomputers" ACM Transactions on Programming Languages and Systems, 1980.

    Google Scholar 

  29. S.Y.W. Su 1979 "Cellular-logic Devices: Concepts and Applications" IEEE Computer 12,3, 11–25.

    Google Scholar 

  30. M. Schkolnick, "Physical database design techniques", In Data Base Design Techniques II S.B. Yao and T.L. Kunii, Eds., Springer-Verlag, pp. 229–252, 1982.

    Google Scholar 

  31. J.W. Schmidt, "Parallel processing of relations: a single-assignment approach", In proceedings of the IEEE 5th international conference on very large data bases, pp. 398–408, 1979.

    Google Scholar 

  32. S.Y.W. Su and G. Lipovsky 1975, "CASSM: A Cellular System for Very Large Databases" In Proceedings of the 1st International Conference on Very Large Data Bases" Framingham, Mass., Sept. 22–24. ACM, New York, pp. 456–472.

    Google Scholar 

  33. D. Shasha, "Query Processing in a Symmetric Parallel Environment" 6th Advanced Database Symposium, Proceedings.

    Google Scholar 

  34. R.K. Shultz and R.J. Zingg, "Response Time Analysis of Multiprocessor Computers for Database Support" ACM Transactions of Database Systems, vol.9, no.1, March, 1984, pp. 100–132.

    Google Scholar 

  35. J.D. Ullman, Principles of Database Systems second edition. Computer Science Press, 1982.

    Google Scholar 

  36. P. Valduriez and G. Gardarin, "Join and Semijoin Algorithms for a Multiprocessor Database Machine" ACM Transactions of Database Systems, vol. 9, no. 1, March 1984, pp. 133–161.

    Google Scholar 

  37. U. Vishkin, "A parallel-design distributed-implementation (PDDI) general-purpose computer", Technical Report no. 96, New York University department of computer science, June, 1983.

    Google Scholar 

  38. E. Wong and K. Youssefi, "Decomposition a strategy for query processing" ACM TODS 1,3 Sept. 1976, pp. 223–241.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

E. N. Houstis T. S. Papatheodorou C. D. Polychronopoulos

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shasha, D., Spirakis, P. (1988). Fast parallel algorithms for processing of joins. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds) Supercomputing. ICS 1987. Lecture Notes in Computer Science, vol 297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-18991-2_55

Download citation

  • DOI: https://doi.org/10.1007/3-540-18991-2_55

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-18991-6

  • Online ISBN: 978-3-540-38888-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics