Abstract
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A.: A Fast Set Intersection Algorithm for Sorted Sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 400–408. Springer, Heidelberg (2004)
Baeza-Yates, R.A., Salinger, A.: Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences. In: Proceedings of 12th International Conference on String Processing and Information Retrieval (SPIRE), pp. 13–24 (2005)
Barbay, J., Kenyon, C.: Adaptive Intersection and t-Threshold Problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 390–399 (2002)
Bentley, J.L., Yao, A.C.-C.: An almost optimal algorithm for unbounded searching. Information Processing Letters 5(3), 82–87 (1976)
Blandford, D.K., Blelloch, G.E.: Compact Representations of Ordered Sets. In: Daniel, K. (ed.) ACM/SIAM Symposium on Discrete Algorithms (SODA), pp. 11–19 (2004)
Erik D. Demaine, Thouis R. Jones, Mihai Patrascu. Interpolation search for non-independent data. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 529–530, 2004.
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 743–752 (2000)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Experiments on Adaptive set intersections for text retrieval systems. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 91–104. Springer, Heidelberg (2001)
Estivill-Castro, V., Wood, D.: A survey of adaptive sorting algorithms. ACM Computing Surveys 24(4), 441–476 (1992)
Frakes, W., Baeza-Yates, R.: Information Retrieval. Prentice-Hall, Englewood Cliffs (1992)
Gonnet, G., Rogers, L., George, G.: An algorithmic and complexity analysis of interpolation search. Acta Informatica 13(1), 39–52 (1980)
Hwang, F.K., Lin, S.: Optimal Merging of 2 Elements with n Elements. Acta Informatica 1, 145–158 (1971)
Hwang, F.K., Lin, S.: A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets. SIAM Journal of Computing 1, 31–39 (1972)
Hwang, F.K.: Optimal Merging of 3 Elements with n Elements. SIAM Journal of Computing 9, 298–320 (1980)
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. In: Proceedings of the 1st Symposium on Discrete Algorithms (SODA), pp. 319–327 (1990)
Perl, Y., Itai, A., Avni, H.: Interpolation search–A loglogn search. CACM 21(7), 550–554 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbay, J., López-Ortiz, A., Lu, T. (2006). Faster Adaptive Set Intersections for Text Searching. In: Àlvarez, C., Serna, M. (eds) Experimental Algorithms. WEA 2006. Lecture Notes in Computer Science, vol 4007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11764298_13
Download citation
DOI: https://doi.org/10.1007/11764298_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34597-8
Online ISBN: 978-3-540-34598-5
eBook Packages: Computer ScienceComputer Science (R0)