Abstract
Chip multiprocessors designed for streaming applications such as Cell BE offer impressive peak performance but suffer from limited bandwidth to off-chip main memory. As the number of cores is expected to rise further, this bottleneck will become more critical in the coming years. Hence, memory-efficient algorithms are required. As a case study, we investigate parallel sorting on Cell BE as a problem of great importance and as a challenge where the ratio between computation and memory transfer is very low. Our previous work led to a parallel mergesort that reduces memory bandwidth requirements by pipelining between SPEs, but the allocation of SPEs was rather ad-hoc. In our present work, we investigate mappings of merger nodes to SPEs. The mappings are designed to provide optimal trade-offs between load balancing, buffer memory consumption, and communication load on the on-chip bus. We solve this multi-objective optimization problem by deriving an integer linear programming formulation and compute Pareto-optimal solutions for the mapping of merge trees with up to 127 merger nodes. For mapping larger trees, we give a fast divide-and-conquer based approximation algorithm. We evaluate the sorting algorithm resulting from our mappings by a discrete event simulation.
Chapter PDF
References
Chen, T., Raghavan, R., Dale, J.N., Iwata, E.: Cell broadband engine architecture and its first implementation—a performance view. IBM J. Res. Devel. 51(5), 559–572 (2007)
Huh, J., Keckler, S.W., Burger, D.: Exploring the design space of future CMPs. In: Proc. Int.l Conf. Parallel Architectures and Compilation Techniques (PACT 2001), pp. 199–210 (2001)
Akl, S.G.: Parallel Sorting Algorithms. Academic Press, London (1985)
JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)
Gedik, B., Bordawekar, R., Yu, P.S.: Cellsort: High performance sorting on the cell processor. In: Proc. 33rd Intl. Conf. on Very Large Data Bases, pp. 1286–1207 (2007)
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In: Proc. Int.l Conf. Parallel Architectures and Compilation Techniques (PACT 2007), pp. 189–198 (2007)
Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing 14, 361–372 (1992)
ILOG Inc.: Cplex version 10.2 (2007), http://www.ilog.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keller, J., Kessler, C.W. (2009). Optimized Pipelined Parallel Merge Sort on the Cell BE. In: César, E., et al. Euro-Par 2008 Workshops - Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00955-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-00955-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00954-9
Online ISBN: 978-3-642-00955-6
eBook Packages: Computer ScienceComputer Science (R0)