Abstract
The competition for higher performance/price ratio is pushing processor chip design into the manycore age. Existing multicore technologies such as caching no longer scale up to dozens of processor cores. Even moderate level of performance optimisation requires direct handling of data locality and distribution. Such architectural complexity inevitably introduces challenges to programming. A better approach for abstraction than completely relying on compiler optimization is to expose some performance-critical features of the system and expect the programmer to handle them explicitly. This paper studies an algebra-semantic programming theory for a performance-transparent level of parallel programming of array-based data layout, distribution, transfer and their affinity with threads. The programming model contributes to the simplification of system-level complexities and the answering of crucial engineering questions through rigorous reasoning.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
CUDA CUFFT Library, Version 2.3. NVIDIA Corp. (2009)
Abadi, M., Lamport, L.: Conjoining specifications. ACM Trans. Program. Lang. Syst. 17(3), 507–534 (1995)
Chafi, H., et al.: A domain-specific approach to heterogeneous parallelism. In: PPoPP 2011 (2011)
Chakravarty: Accelerating Haskell array codes with multicore gpus. In: Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming, DAMP 2011, pp. 3–14 (2011)
Chamberlain, B., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. IJHPCA 21(3), 291–312 (2007)
Chamberlain, B., et al.: The high-level parallel language ZPL improves productivity and performance. In: IJHPCA 2004 (2004)
Charles, P., et al.: X10: An object-oriented approach to nonuniform cluster computing. In: OOPSLA 2005 (2005)
Chen, Y., Cui, X., Mei, H.: Large-scale FFT on GPU clusters. In: ACM Inter. Conf. on Supercomputing, ICS 2010, pp. 50–59 (2010)
Chen, Y., Cui, X., Mei, H.: Parray: A unifying array representation for heterogeneous parallelism. In: PPoPP 2012, pp. 171–180 (2012)
Chen, Y., Sanders, J.: Logic of global synchrony. ACM Transactions on Programming Languages and Systems 26(2), 221–262 (2004)
Cleaveland, R., Luettgen, G., Natarajan, V.: Priority and abstraction in process algebra. Information and Computation 205(9), 1426–1428 (2007)
Deitz: The design and implementation of a parallel array operator for the arbitrary remapping of data. In: PPoPP 2003, pp. 155–166 (2003)
Dongarra, J., Snir, M., Otto, S., Walker, D.: A message passing standard for MPP and workstations. Communications of the ACM 39(7), 84–90 (1996)
Park, J.-Y., et al.: A portable runtime interface for multi-level memory hierarchies. In: PPoPP 2008, pp. 143–152 (2008)
Francois, B.: Incremental migration of C and Fortran applications to GPGPU using HMPP. Technical report, HIPEAC (2010)
Ganesh, B., et al.: Programming for parallelism and locality with hierarchically tiled arrays. In: PPoPP 2006, pp. 48–57 (2006)
Grothoff, C., Palsberg, J., Saraswat, V.: A type system for distributed arrays (2012) (preprint)
Hains, G., Mullin, L.M.R.: Parallel functional programming with arrays. Comput. J. 36(3), 238–245 (1993)
Hoare, C.A.R., et al.: Laws of programming. Communications of the ACM 30(8), 672–686 (1987)
Hoare, C.A.R., He, J.: Unifying Theories of Programming. Prentice Hall (1998)
Iverson, K.E.: Operators. ACM Trans. Program. Lang. Syst. 1(2), 161–176 (1979)
Nieplocha, J.J., Harrison, R.J., Littlefield, R.J.: Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing 10(2) (1996)
Karger, B.V.: Temporal algebra. Mathematical Structures in Computer Science, 32–80 (1996)
Keller, G., Chakravarty, M.M., Leshchinskiy, R., Peyton Jones, S., Lippmeier, B.: Regular, shape-polymorphic, parallel arrays in Haskell. In: ICFP 2010, pp. 261–272 (2010)
Macedo, H.D., Oliveira, J.N.: Matrices as arrows! In: Bolduc, C., Desharnais, J., Ktari, B. (eds.) MPC 2010. LNCS, vol. 6120, pp. 271–287. Springer, Heidelberg (2010)
Milner, R.: Communication and Concurrency. Prentice Hall (1989)
Numerich, R., Reid, J.: Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)
PKU Manycore Software Research Group. Parray user’s manual (2013), http://code.google.com/p/parray-programming/
Richardson, H.: High Performance Fortran: history, overview and current developments. Technical Report TMC-261, Thinking Machines Corporation (1996)
Sangiorgi, D., Walker, D.: The pi-calculus: a Theory of Mobile Processes. Cambridge Universtity Press (2001)
Yelick, K., et al.: Titanium: A high-performance Java dialect. In: ACM, pp. 10–11 (1998)
Zheng, Y., et al.: Extending Unified Parallel C for GPU computing. In: SIAM Conf. on Parallel Processing for Scientific Computing (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, Y. (2013). Algebraic Program Semantics for Supercomputing. In: Liu, Z., Woodcock, J., Zhu, H. (eds) Theories of Programming and Formal Methods. Lecture Notes in Computer Science, vol 8051. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39698-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-39698-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39697-7
Online ISBN: 978-3-642-39698-4
eBook Packages: Computer ScienceComputer Science (R0)