Abstract
Many database applications require sorting a table (or relation) over multiple sort orders. Some examples include creation of multiple indices on a relation, generation of multiple reports from a table, evaluation of a complex query that involves multiple instances of a relation, and batch processing of a set of queries. In this paper, we study how to optimize multiple sortings of a table. We investigate the correlation between sort orders and exploit sort-sharing techniques of reusing the (partial) work done to sort a table on a particular order for another order. Specifically, we introduce a novel and powerful evaluation technique, called cooperative sorting, that enables sort sharing between seemingly non-related sort orders. Subsequently, given a specific set of sort orders, we determine the best combination of various sort-sharing techniques so as to minimize the total processing cost. We also develop techniques to make a traditional query optimizer extensible so that it will not miss the truly cheapest execution plan with the sort-sharing (post-) optimization turned on. We demonstrate the efficiency of our ideas with a prototype implementation in PostgreSQL and evaluate the performance using both TPC-DS benchmark and synthetic data. Our experimental results show significant performance improvement over the traditional evaluation scheme.
Similar content being viewed by others
References
Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: VLDB (1996)
Cao, Y., Das, G.C., Chan, C.-Y., Tan, K.-L.: Optimizing complex queries with multiple relation instances. In: SIGMOD (2008)
Charikar, M., Chaudhuri, S., Motwani, R., Narasayya, V.: Towards estimation error guarantees for distinct values. In: PODS (2000)
Charikar, M., Chekuri, C., Dai, Z., Goel, A., Guha, S., Li, M.: Approximation algorithms for directed steiner problems. In: Journal of Algorithms, pp. 73–91 (1999)
Dinsmore R.: Longer strings for sorting. Commun. ACM 8(1), 48 (1965)
Estivill-Castro V., Wood D.: A survey of adaptive sorting algorithms. ACM Comput. Surv. (CSUR) 24(4), 441–476 (1992)
Frazer W., Wong C.: Sorting by natural selection. Commun. ACM 15(10), 910–913 (1972)
Georgiadis L.: Arborescence optimization problems solvable by edmonds’ algorithm. Theor. Comput. Sci. 301(1–3), 427–437 (2003)
Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB (2001)
Graefe G.: Implementing sorting in database systems. ACM Comput. Surv. 38(3), 10 (2006)
Graefe, G., McKenna, W.J.: The volcano optimizer generator: Extensibility and efficient search. In: ICDE (1993)
Guravannavar, R., Sudarshan, S.: Reducing order enforcement cost in complex query plans. In: ICDE (2007)
Hsieh M.-I., Wu E.H.-K., Tsai M.-F.: Fasterdsp: a faster approximation algorithm for directed steiner tree problem. J. Inf. Sci. Eng. 22(6), 1409–1425 (2006)
Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)
Knuth D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd edn. Addison-Wesley, Redwood City (1998)
Kooi, R.P.: The optimization of queries in relational databases. PhD thesis, Case Western Reserve University (1980)
Larson P.: External sorting: Run formation revisited. IEEE Trans. Knowl. Data Eng. 15(4), 961–972 (2003)
Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: VLDB (2004)
Neumann, T., Moerkotte, G.: An efficient framework for order optimization. In: ICDE (2004)
Pai, V., Varman, P.: Prefetching with multiple disks for external mergesort: simulation and analysis. In: ICDE (1992)
Postgresql Offical Website. http://www.postgresql.org/
Roussopoulos N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 258–290 (1982)
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD (2000)
Salzberg B.: Merging sorted runs using large main memory. Acta Informatica 27(3), 195–215 (1989)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD (1979)
Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: SIGMOD (1996)
TPC BENCHMARK Decision Support. http://www.tpc.org/tpcds/
Ting T., Wang Y.: Multiway replacement selection sort with dynamic reservoir. Comput. J. 20(4), 298–301 (1977)
Wang, X., Cherniack, M.: Avoiding sorting and grouping in processing queries. In: VLDB (2003)
Weininger, A.: Efficient execution of joins in a star schema. In: SIGMOD (2002)
Zhang, W., Larson, P.: Dynamic memory adjustment for external mergesort. In: VLDB (1997)
Zhang, W., Larson, P.-A.: Buffering and read-ahead strategies for external mergesort. In: VLDB (1998)
Zheng L., Larson P.: Speeding up external mergesort. IEEE Trans. Knowl. Data Eng. 8(2), 322–332 (1996)
Zukowski, M., Héman, S., Nes, N., Boncz, P.A.: Cooperative scans: dynamic bandwidth sharing in a DBMS. In: VLDB (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, Y., Bramandia, R., Chan, CY. et al. Sort-sharing-aware query processing. The VLDB Journal 21, 411–436 (2012). https://doi.org/10.1007/s00778-011-0251-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-011-0251-9