Summary
External sorting is usually accomplished by first creating sorted runs, then merging the runs. In the merge phase, writing and calculating can be overlapped by reading if two input buffers are used for each sorted run. If the memory is very large, the input buffers will be large and using two input buffers per sorted run will be more efficient than using only one input buffer per run and risking reduced overlap of reading and writing. In many cases, merging time can be cut in half. We derive a formula for estimating the total time for merging for a given memory size, file size, number of merging passes and for a given disk drive. We present an extreme example where in spite of having two buffers per run, significant non-overlap occurs. However, in realistic problems, we show that making one merge pass with two input buffers per run is near optimal. This contradicts earlier results on merging which do not take large memory into account.
Similar content being viewed by others
References
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. CACM31, 1116–1127 (1988)
Beck, M., Bitton, D., Wilkinson, M.K.: Sorting large files on a backend multiprocessor. IEEE Trans. Comput.37, 769–778 (1988)
DeWitt, D.J., Katz, R.H., Olken, D., Shapiro, L.D., Stonebraker, M.H., Wood, D.: Implementation techniques for main memory database systems. Proc. SIGMOD, pp. 1–8 (1984)
DISK/TREND Report 1986
Knuth, D.: The art of computer programming, Vol.3. Sorting and searching. Reading, MA: Addison-Wesley 1973
Kwan, S.C., Baer, J.L.: The I/O performance of multiway mergesort and tag sort. IEEE Trans. Comput. C-34 Special Issue on SortingC34, 383–387 (1985)
Salzberg, B.: File structures: An analytic approach. Englewood Cliffs, N.J: Prentice-Hall 1988
Shapiro, L.D.: Join processing in database systems with large main memories. ACM Trans. Database Syst.11, 239–264 (1986)
Tsukerman, A., Gray, J., Stewart, M., Uren, S., Vaughan, B.: Fast sort: An external sort using parallel processing. Tandem Technical Report 86.3, Cupertino, CA, May 1986
Wiederhold, G.: Database design, 2nd Ed. New York, NY: McGraw Hill 1983
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Salzberg, B. Merging sorted runs using large main memory. Acta Informatica 27, 195–215 (1989). https://doi.org/10.1007/BF00572988
Issue Date:
DOI: https://doi.org/10.1007/BF00572988