Abstract
This paper describes restructuring techniques for out-of-core programs (i.e., those that deal with very large quantities of data) based on exploiting locality using a combination of loop and data transformations. Writing efficient out-of-core program is an arduous task. As a result, compiler optimizations directed at improving I/O performance are becoming increasingly important. We describe how a compiler can improve the performance of the code by determining appropriate file layouts for out-of-core arrays and finding suitable loop transformations. In addition to optimizing a single loop nest, our solution can handle a sequence of loop nests. We also show how to generate code when the file layouts are optimized. Preliminary experimental results obtained on an Intel Paragon distributed-memory message-passing multiprocessor demonstrate marked improvements in performance due to the optimizations described in this paper.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
J.M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and computation transformations for multiprocessors. In Proc. 5th ACM SIGPLAN Symp. Prin. & Prac. Par. Prog., July 1995.
A. J. C. Bik, and H. A. G. Wijshoff. On a completion method for unimodular matrices. Technical Report 94-14, Dept. of Computer Science, Leiden University, 1994.
R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny. A model and compilation strategy for out-of-core data-parallel programs. In Proc. SIGPLAN Symp. Prin. & Prac. Par. Pro., July 1995.
R. Bordawekar, A. Choudhary, and J. Ramanujam. Automatic optimization of communication in out-of-core stencil codes, In Proc. 10th Int. Conf. Supercomp., pp. 366–373, 1996.
P. Brezany, T. A. Muck, and E. Schikuta. Language, compiler and parallel database support for I/O intensive applications, In Proc. High Performance Computing and Networking, 1995.
M. Cierniak, and W. Li. Unifying data and control transformations for distributed shared memory machines. Technical Report 542, CS Dept., University of Rochester, 1994.
P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong. Overview of the MPI-IO parallel I/O interface, Proc. 3rd Workshop I/O in Par. & Dist. Sys., Apr. 1995.
M. Kandemir, R. Bordawekar, and A. Choudhary. Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines. In Proc. IPPS 97, pp. 559–564, April 1997.
M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proc. 11th ACM Int. Conf. Supercomp., pp. 269–278, July 1997.
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A hyperplane based approach for optimizing spatial locality in loop nests. In Proc. 1998 ACM Int. Conf. Supercomp., July 1998.
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to the global locality optimization problem. In Proc. Intl. Conf. Par. Arch. & Comp. Tech. (PACT'98), Oct. 1998.
M. Kandemir, M. Kandaswamy, and A. Choudhary. Global I/O optimizations for out-of-core computations. In Proc. High-Performance Computing Conference (HiPC), Dec. 1997.
S. Leung, and J. Zahorjan. Optimizing data locality by array restructuring. Technical Report, CSE Dept., University of Washington, TR 95-09-01, Sep. 1995.
W. Li. Compiling for NUMA parallel machines. Ph.D. dissertation, Cornell University, 1993.
M. O'Boyle, and P. Knijnenburg. Non-singular data transformations: Definition, validity, applications. In Proc. 6th Workshop on Compilers for Par. Comp., pp. 287–297, 1996.
M. Paleczny, K. Kennedy, and C. Koelbel. Compiler support for out-of-core arrays on parallel machines. CRPC Technical Report 94509-S, Rice University, Dec. 1994.
J. Ramanujam. Non-unimodular transformations of nested loops. In Proc. Supercomputing 92, pages 214–223, Nov 1992.
J. Ramanujam, and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. In IEEE Trans. Par. & Dist. Sys., 2(4):472–482, Oct. 1991.
R. Thakur, A. Choudhary, R. Bordawekar, S. More, and S. Kuditipudi. Passion: Optimized I/O for parallel applications, IEEE Computer, (29)6:70–78, June 1996.
M. Wolf, and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conf. Prog. Lang. Des. & Impl., pages 30–44, June 1991.
M. Wolfe. High Performance Compilers for Parallel Computing, Addison-Wesley, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag
About this paper
Cite this paper
Kandemir, M., Choudhary, A., Ramanujam, J. (1999). Restructuring I/O-intensive computations for locality. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100670
Download citation
DOI: https://doi.org/10.1007/BFb0100670
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65821-4
Online ISBN: 978-3-540-48933-7
eBook Packages: Springer Book Archive