Abstract
We consider FFTs for networks with multiprocessor nodes using 2D data decomposition. In this application, processors perform collective all-to-all communication in different groups independently at the same time. Thus the individual processors of the nodes might be involved in independent collective communication. The underlying communication algorithm should account for that fact. For short messages, we propose a sparse version of Bruck’s algorithm which handles such multiple collectives. The distribution of the FFT data to the nodes is discussed for the local and global application of Bruck’s original algorithm, as well as the suggested sparse version. The performance of the different approaches is compared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adler, M., Byers, J.W., Karp, R.M.: Scheduling Parallel Communication: The h-Relation Problem. In: Wiedermann, J., Hájek, P. (eds.) MFCS 1995. LNCS, vol. 969, pp. 1–20. Springer, Heidelberg (1995)
Brass, A., Pawley, G.S.: Two and three dimensional FFTs on highly parallel computers. Parallel Comput. 3, 167–184 (1986)
Bruck, J., Ho, C.T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE T. Parall. Distr. 8(11), 1143–1156 (1997)
Chan, A., Balaji, P., Gropp, W., Thakur, R.: Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 350–364. Springer, Heidelberg (2008)
Fang, B., Deng, Y., Martyna, G.: Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Comput. Phys. Commun. 176, 531–538 (2007)
Fraigniaud, P., Lazard, E.: Methods and problems of communication in usual networks. Discrete Appl. Math. 53, 79–133 (1994)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. P IEEE 93(2), 216–231 (2005)
Goldman, A., Peters, J.G., Trystram, D.: Exchanging messages of different size. J. Parallel Distr. Com. 66, 1–18 (2006)
Gupta, A., Kumar, V.: The scalability of FFT on parallel computers. IEEE T. Parall. Distr. 4(8), 922–932 (1993)
Helman, D.R., Bader, D.A., JáJá, J.: Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract). In: SPAA 1996: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 211–222. ACM, New York (1996)
Johnsson, S.L., Ho, C.T.: Optimum broadcasting and personalized communication in hypercubes. IEEE T. Comput. 38(9), 1249–1268 (1989)
van Loan, C.: Computational Frameworks for the Fast Fourier Transfrom. SIAM, Philadelphia (1992)
Sanders, P., Solis-Oba, R.: How helpers hasten h-relations. J. Algorithm. 41, 86–98 (2001)
Sanders, P., Träff, J.L.: The Hierarchical Factor Algorithm for All-to-All Communication. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 799–803. Springer, Heidelberg (2002)
Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Comput. 5, 197–210 (1987)
Takahashi, D.: An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 606–614. Springer, Heidelberg (2010)
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. C. 19(1), 49–66 (2005)
Tipparaju, V., Nieplocha, J., Panda, D.: Fast collective operations using shared and remote memory access protocols on clusters. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France (April 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jocksch, A. (2012). FFTs and Multiple Collective Communication on Multiprocessor-Node Architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-31464-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31463-6
Online ISBN: 978-3-642-31464-3
eBook Packages: Computer ScienceComputer Science (R0)