skip to main content
article

Implementing sorting in database systems

Published: 30 September 2006 Publication History

Abstract

Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers in-memory sorting, disk-based external sorting, and considerations that apply specifically to sorting in database systems.

References

[1]
Agarwal, R. C. 1996. A super scalar sort algorithm for RISC processors. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 240--246.
[2]
Aho, A., Hopcroft, J. E., and Ullman, J. D. 1983. Data Structures and Algorithms. Addison-Wesley, Reading, MA.
[3]
Andersson, A. and Nilsson, S. 1998. Implementing radixsort. ACM J. Experimental Algorithms 3, 7.
[4]
Antoshenkov, G., Lomet, D. B., and Murray, J. 1996. Order-Preserving compression. In Proceedings of the IEEE International Conference on Data Engineering. 655--663.
[5]
Arpaci-Dusseau, A. C., Arpaci-Dusseau, R., Culler, D. E., Hellerstein, J. M., and Patterson, D. A. 1997. High-Performance sorting on networks of workstations. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 243--254.
[6]
Baer, J.-L. and Lin, Y.-B. 1989. Improving quicksort performance with a codeword data structure. IEEE Trans. Softw. Eng. 15, 5, 622--631.
[7]
Barclay, T., Barnes, R., Gray, J., and Sundaresan, P. 1994. Loading databases using dataflow parallelism. ACM SIGMOD Rec. 23, 4, 72--83.
[8]
Bernstein, P. A. and Chiu, D.-M. W. 1981. Using semi-joins to solve relational queries. J. ACM 28, 1, 25--40.
[9]
Bitton, D. and Dewitt, D. J. 1983. Duplicate record elimination in large data files. ACM Trans. Database Syst. 8, 2, 255--265.
[10]
Blasgen, M. W., Casey, R. G., and Eswaran, K. P. 1977. An encoding method for multifield sorting and indexing. Comm. ACM 20, 11, 874--878.
[11]
Chen, P. M., Lee, E. L., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comp. Surv. 26, 2, 145--185.
[12]
Comer, D. 1979. The ubiquitous B-Tree. ACM Comp. Surv. 11, 2, 121--137.
[13]
Conner, W. M. 1977. Offset value coding. IBM Technical Disclosure Bulletin 20, 7, 2832--2837.
[14]
Cormen, T. H., Leiserson, C. E., Rivest, and R. L., Stein, C. 2001. Introduction to Algorithms, 2nd ed. Cambridge, MA. MIT Press.
[15]
Estivill-Castro, V. and Wood, D. 1992. A survey of adaptive sorting algorithms. ACM Comp. Surv. 24, 4, 441--476.
[16]
Gassner, P., Lohman, G. M., Schiefer, K. B., and Wang, Y. 1993. Query optimization in the IBM DB2 family. IEEE Data Eng. Bulletin 16, 4, 4--18.
[17]
Goldstein, J., Ramakrishnan, R., and Shaft, U. 1998. Compressing relations and indexes. In Proceedings of the IEEE International Conference on Data Engineering, 370--379.
[18]
Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comp. Surv. 25, 2, 73--170.
[19]
Graefe, G. 2003. Sorting and indexing with partitioned B-Trees. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). Asilomar, CA.
[20]
Graefe, G. 2003b. Executing nested queries. In Proceedings of the Datenbanksysteme für Business, Technologie und Web (BTW) Conference. Leipzig, Germany, 58--77.
[21]
Graefe, G. and Larson, P.-A. 2001. B-Tree indexes and CPU caches. In Proceedings of the IEEE International Conference On Data Engineering. Heidelberg, Germany. 349--358.
[22]
Graefe, G., Bunker, R., and Cooper, S. 1998. Hash joins and hash teams in microsoft SQL server. In Proceedings of the Conference on Very Large Databases (VLDB). 86--97.
[23]
Härder, T. 1977. A Scan-driven sort facility for a relational database system. In Proceedings of the Conference on Very Large Databases (VLDB). 236--244.
[24]
Harizopoulos, S. and Ailamaki, A. 2003. A case for staged database systems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). Asilomar, CA.
[25]
Hu, T. C. and Tucker, A. C. 1971. Optimal computer search trees and variable-length alphabetic codes. SIAM J. Appl. Math. 21, 4, 514--532.
[26]
Iyer, B. R. and Dias, D. M. 1990. System issues in parallel sorting for database systems. In Proceedings of the IEEE International Conference on Data Engineering. 246--255.
[27]
Keller, T., Graefe, G., and Maier, D. 1991. Efficient assembly of complex objects. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 148--157.
[28]
Kitsuregawa, M., Nakayama, M., and Takagi, M. 1989. The effect of bucket size tuning in the dynamic hybrid GRACE hash join method. In Proceedings of the Conference on Very Large Databases (VLDB) Conference. 257--266.
[29]
Knuth, D. E. 1998. The Art of Computer Programming: Sorting and Searching. Addison Wesley Longman.
[30]
Kooi, R. 1980. The optimization of queries in relational databases, Ph.D. thesis, Case Western Reserve University.
[31]
Kwan, S. C. and Baer, J.-L. 1985. The I/O performance of multiway mergesort and tag sort. IEEE Trans. Comput. 34, 4, 383--387.
[32]
Larus, J. R. and Parkes, M. 2001. Using cohort scheduling to enhance server performance. Microsoft Research Tech. Rep. 39.
[33]
Larson, P.-L. 2003. External sorting: Run formation revisited. IEEE Trans. Knowl. Data Eng. 15, 4, 961--972.
[34]
Larson, P.-A. and Graefe, G. 1998. Memory management during run generation in external sorting. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 472--483.
[35]
Lelewer, D. A. and Hirschberg, D. S. 1987. Data compression. ACM Comp. Surv. 19, 3, 261--296.
[36]
Manku, G. S., Rajagopalan, S., and Lindsay, B. G. 1998. Approximate medians and other quantiles in one pass and with limited memory. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 426--435.
[37]
McJones, P. Ed. 1997. The 1995 SQL reunion: People, projects, and politics. SRC Tech. Note 1997-018, Digital Systems Research Center. Palo Alto, CA.
[38]
Mohan, C. and Narang, I. 1992. Algorithms for creating indexes for very large tables without quiescing updates. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 361--370.
[39]
Nyberg, C, Barclay, T., Cvetanovic, Z., Gray, J., and Lomet, D. B. 1995. AlphaSort: A cache-sensitive parallel external sort. VLDB J. 4, 4, 603--627.
[40]
Padmanabhan, S., Malkemus, T., Agarwal, R. C., and Jhingran, A. 2001. Block-Oriented processing of relational database operations in modern computer architectures. In Proceedings of the IEEE International Conference on Data Engineering. 567--574.
[41]
Pang, H., Carey, M. J., and Livny, M. 1993. Memory-adaptive external sorting. In Proceedings of the Conference on Very Large Databases (VLDB). 618--629.
[42]
Rahman, N. and Raman, R 2000. Analysing cache effects in distribution sorting. ACM J. Experimental Algorithms 5, 14.
[43]
Rahman, N. and Raman, R 2001. Adapting radix sort to the memory hierarchy. ACM J. Experimental Algorithms 6, 7.
[44]
Salzberg, B. 1989. Merging sorted runs using large main memory. Acta Informatica 27, 3, 195--215.
[45]
Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. 1979. Access path selection in a relational database management system. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 23--34.
[46]
Seshadri, P., Hellerstein, J. M., Pirahesh, H., Leung, T. Y. C., Ramakrishnan, R., Srivastava, D., Stuckey, P. J., and Sudarshan, S. 1996. Cost-based optimization for magic: Algebra and implementation. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 435--446.
[47]
Simmen, D. E., Shekita, E. J., and Malkemus, T. 1996. Fundamental techniques for order optimization. In Proceedings of the Extending Database Technology Conference. 625--628.
[48]
Stonebraker, M. and Kumar, A. 1986. Operating system support for data management. IEEE Database Eng. Bulletin 9, 3, 43--50.
[49]
Vitter, J. S. 1987. Design and analysis of dynamic Huffman codes. J. ACM 34, 4, 825--845.
[50]
Zhang, W. and Larson, P.-A. 1997. Dynamic memory adjustment for external mergesort. In Proceedings of the Conference on Very Large Databases (VLDB). 376--385.
[51]
Zhang, W. and Larson, P.-A. 1998. Buffering and read-ahead strategies for external mergesort. In Proceedings of the Conference on Very Large Databases (VLDB). 523--533.
[52]
Zhang, C., Naughton, J. F., Dewitt, D. J., Luo, Q., Lohman, G. M. 2001. On supporting containment queries in relational database management systems. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. 425--436.
[53]
Zheng, L. and Larson, P.-A. 1996. Speeding Up external mergesort. IEEE Trans. Knowl. Data Eng. 8, 2, 322--332.

Cited By

View all
  • (2025)Efficiently Processing Joins and Grouped Aggregations on GPUsProceedings of the ACM on Management of Data10.1145/37096893:1(1-27)Online publication date: 11-Feb-2025
  • (2024)Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query EngineCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653368(5-17)Online publication date: 9-Jun-2024
  • (2024)A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00038(134-140)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 38, Issue 3
2006
129 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/1132960
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2006
Published in CSUR Volume 38, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Key normalization
  2. asynchronous read-ahead
  3. compression
  4. dynamic memory resource allocation
  5. forecasting
  6. graceful degradation
  7. index operations
  8. key conditioning
  9. nested iteration

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)9
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Efficiently Processing Joins and Grouped Aggregations on GPUsProceedings of the ACM on Management of Data10.1145/37096893:1(1-27)Online publication date: 11-Feb-2025
  • (2024)Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query EngineCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653368(5-17)Online publication date: 9-Jun-2024
  • (2024)A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00038(134-140)Online publication date: 27-May-2024
  • (2024)A Hybrid Approach to Group-By and Aggregation Query Execution2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825803(3799-3808)Online publication date: 15-Dec-2024
  • (2024)Deep Learning Service for Efficient Data Distribution Aware Sorting2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825633(1508-1515)Online publication date: 15-Dec-2024
  • (2024)thSORT: an efficient parallel sorting algorithm on multi-core DSPsCCF Transactions on High Performance Computing10.1007/s42514-023-00175-7Online publication date: 19-Jan-2024
  • (2023)LearnedSort as a learning-augmented SampleSort: Analysis and ParallelizationProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603731(1-9)Online publication date: 10-Jul-2023
  • (2023)Robust and Efficient Sorting with Offset-value CodingACM Transactions on Database Systems10.1145/357095648:1(1-23)Online publication date: 13-Mar-2023
  • (2023)Efficient Sorting, Duplicate Removal, Grouping, and AggregationACM Transactions on Database Systems10.1145/356802747:4(1-35)Online publication date: 6-Jan-2023
  • (2023)Scalable Low-Cost Sorting Network with Weighted Bit-Streams2023 24th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED57927.2023.10129357(1-6)Online publication date: 5-Apr-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media