Abstract
In this paper we study the sorting performance of a 128-processor CRAY T3D and discuss the efficient use of the toroidal network connecting the processors. The problems we consider range from that of sorting one word per processor to sorting the entire memory of the machine, and we give efficient algorithms for each case. In addition, we give both algorithms that make assumptions about the distribution of the data and those that make no assumptions. The clear winner, if data can be assumed to be uniformly distributed, is a method that we call a hash-and-chain sort. The time for this algorithm to sort one million words per processor over 64 processors is less than two seconds, which compares favorably to about four seconds using a 4-processor CRAY C90 and about 17 seconds using a 64-processor Thinking Machines CM-5.
Similar content being viewed by others
References
M. Ajtai, J. Komlos, and E. Szmeredi. An O(n log n) sorting network. In Proceedings of the Fifteenth Annual ACM Symposium on the Theory of Computing, pp. 1–9. ACM Press, 1983.
S. G. Akl. Parallel Sorting Algorithms. Academic Press, Toronto, 1985.
K. Batcher. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computing Conference, pp. 307–314, 1968.
G. E. Blelloch. Vector Models for Data-Parallel Computing. The MIT Press, Cambridge, Mass., 1990.
G. E. Blelloch, C. Leiserson, B. Maggs, C. G. Plaxton, S. Smith, and M. Zagha. A comparison of sorting algorithms for the Connection Machine CM-2. In Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures (Hilton Head, S.C., July 21–24). ACM Press, 1991.
W. Carlson and J. Draper. AC for the T3D. Technical Report SRC-TR-95–141, IDA Center for Computing Sciences, Bowie, Md., 1995.
R. Cole. Parallel merge sort. SIAM Journal of Computing, 17:770–785, 1988.
R. W. Floyd and R. L. Rivest. Expected time bounds for selection. Communications of the ACM, 18(3): 165–172, 1975.
W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, 17(3):496–507, 1970.
C. A. R. Hoare. Algorithm 63 (partition) and algorithm 65 (find). Communications of the ACM, 4(7):321–322, 1961.
D. Knuth. The Art of Computer Programming, vol. 3, Searching and Sorting. Addison-Wesley, Reading, Mass., 1973.
H. Li and K. C. Sevcik. Parallel sorting by overpartitioning. In Proceedings of the Symposium on Parallel Algorithms and Architectures, pp. 45–56, 1994.
D. Nassimi and S. Sahni. Parallel permutation and sorting algorithms and a new generalized connection network. Journal of the ACM, 29(3): 642–667, 1982.
M. S. Paterson. Improved sorting networks with O(log n) depth. Algorithmica, 5: 75–92, 1990.
S. Rajasekaran and J. H. Rief. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM Journal of Computing, 18(3):594–607, 1989.
J. Reif and L. Valiant. A logarithmic time sort for linear size networks. Journal of the ACM, 34(1):60–76, 1987.
H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361–372, 1992.
K. Thearling and S. Smith. An improved supercomputer sorting benchmark. In Supercomputing '92 (Minneapolis, Nov. 16–20), pp. 14–19. IEEE Computer Society Press, 1992.
Y. Won and S. Sahni. A balanced bin sort for hypercube multicomputers. Journal of Supercomputing, 2:435–448, 1988.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dixon, B., Swallow, J. High-performance sorting algorithms for the CRAY T3D parallel computer. J Supercomput 10, 371–395 (1997). https://doi.org/10.1007/BF00227864
Issue Date:
DOI: https://doi.org/10.1007/BF00227864