Abstract
We revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M, and output runs (contiguously sorted chunks of elements) that are as long as possible.
We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting.
First, we analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal.
Next, we give online algorithms having smaller competitive ratios with resource augmentation. We demonstrate that performance can also be improved with a small amount of foresight. Lastly, we present algorithms tailored for “nearly sorted” inputs which are guaranteed to have sufficiently long optimal runs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The external-memory (or I/O) model applies to any two levels of the memory hierarchy.
- 2.
Data structures such as heaps can identify the smallest elements in memory. But from the perspective of minimizing I/Os, this does not matter—computation is free in the DAM model.
- 3.
Note that for a given input, minimizing the number of runs is equivalent to maximizing the average length of runs.
- 4.
Due to space constraints, we defer some proofs to the full-version [2].
References
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
Bender, M.A., McCauley, S., McGregor, A., Singh, S., Vu, H.T.: Run generation revisited: What goes up may or may not come down. arXiv preprint arXiv:1504.06501 (2015)
Chandramouli, B., Goldstein, J.: Patience is a virtue: revisiting merge and sort on modern processors. In: Proceedings International Conference on Management of Data, pp. 731–742 (2014)
Estivill-Castro, V., Wood, D.: A survey of adaptive sorting algorithms. ACM Comput. Surv. 24(4), 441–476 (1992)
Frazer, W., Wong, C.: Sorting by natural selection. Commun. ACM 15(10), 910–913 (1972)
Friend, E.H.: Sorting on electronic computer systems. J. ACM 3(3), 134–168 (1956)
Gassner, B.J.: Sorting by replacement selecting. Commun. ACM 10(2), 89–93 (1967)
Goetz, M.A.: Internal and tape sorting using the replacement-selection technique. Commun. ACM 6(5), 201–206 (1963)
Graefe, G.: Implementing sorting in database systems. ACM Comput. Surv. 38(3), 10 (2006)
Knuth, D.E.: Length of strings for a merge sort. Commun. ACM 6(11), 685–688 (1963)
Knuth, D.E.: The Art of Computer Programming: Sorting and Searching. Adison-Wesley, Reading (1998)
Lin, Y.C.: Perfectly overlapped generation of long runs for sorting large files. J. Parallel Distrib. Comput. 19(2), 136–142 (1993)
Lin, Y.C., Lai, H.Y.: Perfectly overlapped generation of long runs on a transputer array for sorting. Microprocess. Microsyst. 20(9), 529–539 (1997)
Mallows, C.L.: Patience sorting. Bulletin Inst. Math. Appl. 5(4), 375–376 (1963)
Martinez-Palau, X., Dominguez-Sal, D., Larriba-Pey, J.L.: Two-way replacement selection. Proc. VLDB Endow. 3, 871–881 (2010)
Wikipedia: Timsort (2004). http://en.wikipedia.org/wiki/Timsort
Acknowledgments
We gratefully acknowledge Goetz Graefe and Harumi Kuno for introducing us to this problem and for their advice. This research was supported by NSF grants CCF 1114809, CCF 1217708, IIS 1247726, IIS 1251137, CNS 1408695, CCF 1439084, CCF 0953754, IIS 1251110, CCF 1320719, and by Google Research and Sandia National Laboratories.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bender, M.A., McCauley, S., McGregor, A., Singh, S., Vu, H.T. (2015). Run Generation Revisited: What Goes Up May or May Not Come Down. In: Elbassioni, K., Makino, K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science(), vol 9472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48971-0_59
Download citation
DOI: https://doi.org/10.1007/978-3-662-48971-0_59
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48970-3
Online ISBN: 978-3-662-48971-0
eBook Packages: Computer ScienceComputer Science (R0)