Abstract
The problem of detecting and removing duplicates tuples within a parallel database is one which has far reaching implications. An efficient solution for these problems would have a positive impact in many areas of Computer Science. We have compared the performance of three algorithms for duplicate detection and removal in a ring-connected, shared-nothing, parallel database system. Each algorithm uses a different pre-processing method to reduce the size of the data set which must be processed. It is shown that in a parallel environment (as described above) pre-processing of the database (in the algorithms we've tested) achieves too little reduction in run-time to offset the added cost of its execution.
The work of Dr. Wayne Patterson and Kevin Grant was funded in part by The Louisiana Consortium for Computing Science and Engineering (LCCSE).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abdelguerfi, M., Sood, A.K., “Computational Complexity of Sorting and Joining Relations with Duplicates,” IEEE Transactions on Knowledge and Data Engineering, in print, to appear December 1991.
Frieder, V. A. Topkar, R.K. Karne, and A. K. Sood, “Experimentation with Hypercube Database Engines”, IEEE Micro, February 1992, pp.42–56.
Abdelguerfi, M., Sood, A.K., “A Bus Connected Cellular Array Unit for Relational Database Machines”, in Database Machines and Knowledge Base Machines, edited by M. Kitsuregawa, and H. Tanaka, 1988, Kluwer Academic Publishers, pp. 243–256
Maller, V.A., “Information Retrieval Using the Content Addressable File Store”, Proceedings of the IFIP-80 Congress, North Holland, 1980, pp.187–190.
Stonebroker, M., “The Case for Shared-Nothing,” Database Engineering, Vol. 9, No.1, 1986, pp.
Abdelguerfi, M., Lavington, S., “Parallel database and Knowledge-Base Systems,” in: Emerging Trends in Database and Knowledge-Base Machines: the application of parallel architectures to smart information systems, (editors: M. Abdelguerfi and Simon Lavington), IEEE Computer Science Press, Advances Series, (to appear: September 1993).
Teuhola, J., Wegner, L., “Minimal Space Average Linear Time Duplication Deletion”, Communications of the ACM, Vol. 34, No. 3, 1991, pp. 62–73
Teuhola, J., Wegner, L., “Technical Correspondence: Duplication Deletion Revisited”, Communications of the ACM, Vol. 35, No. 7, 1992, pp. 99–107.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abdelguerfi, M., Grant, K., Murphy, E., Patterson, W., Stelly, J. (1993). “Duplicate deletion in a ring connected, shared-nothing, parallel database system”. In: Mařík, V., Lažanský, J., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 1993. Lecture Notes in Computer Science, vol 720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57234-1_13
Download citation
DOI: https://doi.org/10.1007/3-540-57234-1_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57234-3
Online ISBN: 978-3-540-47982-6
eBook Packages: Springer Book Archive