skip to main content
article

Distribution sort with randomized cycling

Published: 01 July 2006 Publication History

Abstract

Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often difficult. Each disk can service only one read or write request at a time; the challenge is to keep the disks as busy as possible. In this article, we develop a randomized allocation discipline for parallel independent disks, called randomized cycling. We show how it can be used as the basis for an efficient distribution sort algorithm, which we call randomized cycling distribution sort (RCD). We prove that the expected I/O complexity of RCD is optimal. The analysis uses a novel reduction to a scenario with significantly fewer probabilistic interdependencies. We demonstrate RCD's practicality by experimental simulations. Using the randomized cycling discipline, algorithms developed for the unrealistic multihead disk model can be simulated on the realistic parallel disk model for the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time. In particular, algorithms based upon the well-known distribution and merge paradigms of EM computation can be optimally extended from a single disk to parallel disks.

References

[1]
Aggarwal, A., and Vitter, J. S. 1988. The Input/Output complexity of sorting and related problems. Commun. ACM 31, 9, 1116--1127.
[2]
Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., and Vitter, J. S. 1998. Scalable sweeping-based spatial join. In Proceedings of the International Conference on Very Large Databases (New York), vol. 24. Morgan, Kaufmann, San Francisco, CA, 570--581.
[3]
Arge, L., Vengroff, D. E., and Vitter, J. S. 2007. External-memory algorithms for processing line segments in geographic information systems. Algorithmica. to appear.
[4]
Arge, L., and Vitter, J. S. 2003. Optimal dynamic interval management in external memory. SIAM J. Comput. 32, 6, 1488--1508.
[5]
Barve, R. D., Grove, E. F., and Vitter, J. S. 1997. Simple randomized mergesort on parallel disks. Parall. Comput. 23, 4, 601--631.
[6]
Barve, R. D., and Vitter, J. S. 2002. A simple and efficient parallel disk mergesort. ACM Trans. Comput. Syst. 35, 2 (Mar./Apr.), 189--215.
[7]
Dehne, F., Dittrich, W., and Hutchinson, D. 2003. Efficient external memory algorithms by simulating coarse-grained parallel algorithms. Algorithmica 36, 87--122.
[8]
Dehne, F., Hutchinson, D., and Maheshwari, A. 2002. Bulk synchronous parallel algorithms for the external memory model. Theory Comput. Syst. 35, 567--597.
[9]
Dubhasi, D., and Ranjan, D. 1998. Balls and bins: A study in negative dependence. Rand. Struct. Algor. 13, 99--124.
[10]
Goodrich, M. T., Tsay, J.-J., Vengroff, D. E., and Vitter, J. S. 1993. External-memory computational geometry. In Proceedings of the IEEE Symposium on Foundations of Computer Science. (Palo Alto, CA) IEEE Computer Society Press, Los Alamitos, CA, 714--723.
[11]
Henzinger, M. R., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. 1998--011, Digital Equipment Corporation Systems Research Center, Palo Alto, CA.
[12]
Hutchinson, D. A., Sanders, P., and Vitter, J. S. 2005. Duality between prefetching and queued writing with parallel disks. SIAM J. Comput. 34, 6, 1443--1463.
[13]
Knuth, D. E. 1998. Sorting and Searching, 2nd ed. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading, MA.
[14]
Nodine, M. H., and Vitter, J. S. 1993. Deterministic distribution sort in shared and distributed memory multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures. (Velen, Germany), ACM, New York, 120--129.
[15]
Nodine, M. H., and Vitter, J. S. 1995. Greed Sort: An optimal sorting algorithm for multiple disks. J. ACM 42, 4 (July), 919--933.
[16]
Sanders, P., Egner, S., and Korst, J. 2000. Fast concurrent access to parallel disks. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. (San Francisco, CA). ACM, New York, 849--858.
[17]
TPIE 1999. TPIE user manual and reference. The manual and software distribution are available on the web at http://www.cs.duke.edu/TPIE/.
[18]
Vengroff, D. E. 1994. A transparent parallel I/O environment. In Proceedings of the DAGS Symposium on Parallel Computation (Hanover, NH). 117--134.
[19]
Vengroff, D. E., and Vitter, J. S. 1995. I/O-efficient scientific computation using TPIE. In Proceedings of the IEEE Symposium on Parallel and Distributed Processing. IEEE Computer Society Press, Los Alamitos, CA (San Antonio, TX), 74--77.
[20]
Vitter, J. S. 2001. External memory algorithms and data structures: Dealing with MASSIVE DATA. ACM Computing Surveys 33, 2 (June), 209--271.
[21]
Vitter, J. S., and Flajolet, P. 1990. Average-case analysis of algorithms and data structures. In Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, J. van Leeuwen, Ed. Elsevier and MIT Press, Chap. 9, 431--524.
[22]
Vitter, J. S., and Shriver, E. A. M. 1994. Algorithms for parallel memory I: Two-level memories. Algorithmica 12, 2--3, 110--147.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 53, Issue 4
July 2006
173 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1162349
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2006
Published in JACM Volume 53, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distribution
  2. external memory
  3. external sorting
  4. input/output
  5. merging
  6. multipass algorithms
  7. multiple disks
  8. parallel disks
  9. randomization
  10. sorting

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Optimal Parallel Sorting with Comparison ErrorsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591093(355-365)Online publication date: 17-Jun-2023
  • (2016)External Sorting and PermutingEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_137(708-715)Online publication date: 22-Apr-2016
  • (2015)External Sorting and PermutingEncyclopedia of Algorithms10.1007/978-3-642-27848-8_137-2(1-10)Online publication date: 24-Jun-2015
  • (2008)Algorithms and data structures for external memoryFoundations and Trends® in Theoretical Computer Science10.1561/04000000142:4(305-474)Online publication date: 1-Jan-2008
  • (2008)External Sorting and PermutingEncyclopedia of Algorithms10.1007/978-0-387-30162-4_137(291-297)Online publication date: 2008
  • (2005)PDM Sorting Algorithms That Take A Small Number of PassesProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.334Online publication date: 4-Apr-2005

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media