ABSTRACT
The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes both pipelined and data parallelism to the compiler. Our compiler is responsible for selecting a set of candidate filter boundaries, determining the volume of communication required if a particular boundary is chosen, performing the decomposition, and generating code. We have developed a one-pass algorithm for determining the required communication between consecutive filters. We have developed a cost model for estimating the execution time for a given decomposition, and a dynamic programming algorithm for performing the decomposition. Detailed evaluation of our current compiler using four data-driven applications demonstrate the feasibility of our approach.
- {1} Vikram Adve, Vinh Vi Lam, and Brian Ensink. Language and Compiler Support for Adaptive Distributed Applications. In Proceedings of the SIGPLAN workshop on Optimization of Middleware (OM) and Distributed Systems, June 2001. Google ScholarDigital Library
- {2} Vikram Adve and John Mellor-Crummy. Using integer sets for data-parallel program analysis and optimization. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998. Google ScholarDigital Library
- {3} Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.Google Scholar
- {4} T. Arbogast, S. Bryant, C. Dawson, and M. F. Wheeler. Parssim: The parallel subsurface simulator, single phase. http://www.ticam.utexas.edu/~arbogast/parssim.Google Scholar
- {5} D. Arnold, H. Casanova, and J. Dongarra. Innovation of the netsolve grid computing system. Concurrency Practice and Experience, 2002.Google ScholarCross Ref
- {6} Adam Beguelin, Jack J. Dongarra, George Al Geist, Robert Manchek, and Keith Moore. HenCE: A Heterogenuous Network Computing Environment. Scientific Programming, 3(1):49-60, 1994. Google ScholarDigital Library
- {7} F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Keselman, J. Mellor-Crummery, D. Reed, L. Torczon, and R. Wolski. The GrADS Project: Software Support for High-Level Grid Application Development. International Journal of High Performance Computing Applications, 15(4):327-344, 2001. Google ScholarDigital Library
- {8} F. Berman, R. Wolski, H. Casanova, W. Cirne, H. Dail, M. Faerman, S. Figueira, J. Hayes, G. Obertelli, J. Schopf, G. Shao, S. Smallen, N. Spring, A. Su, and D. Zagorodnov. Adaptive Computing on the Grid Using AppLeS . IEEE Transactions on Parallel and Distributed Systems (to appear), 2003. Google ScholarDigital Library
- {9} Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Chialin Chang, Alan Sussman, and Joel Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11):1457-1478, October 2001. Google ScholarDigital Library
- {10} Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Alan Sussman, and Joel Saltz. A component-based implementation of iso-surface rendering for visualizing large datasets. Technical Report CS-TR-4249 and UMIACS-TR-2001-34, University of Maryland, Department of Computer Science and UMIACS, May 2001.Google Scholar
- {11} Michael D. Beynon, Tahsin Kurc, Alan Sussman, and Joel Saltz. Optimizing execution of component-based applications using group instances. In Proceedings of the Conference on Cluster Computing and the Grid (CCGRID), pages 56-63. IEEE Computer Society Press, May 2001. Google ScholarDigital Library
- {12} Francois Bodin, Peter Beckman, Dennis Gannon, Srinivas Narayana, and Shelby X. Yang. Distributed pC++: Basic ideas for an object parallel language. Scientific Programming, 2(3), Fall 1993.Google Scholar
- {13} Fabian E. Bustamante, Greg Eisenhauer, Karsten Schwan, and Patrick Widener. Active Streams and the Effects of Stream Specialization. In Poster in Proc. of Tenth International Symposium on High Performance Distributed Computing (HPDC-2001). IEEE Computer Society Press, August 2001.Google Scholar
- {14} Srinivas Chippada, Clint N. Dawson, Monica L. Martínez, and Mary F. Wheeler. A Godunov-type finite volume method for the system of shallow water equations. Computer Methods in Applied Mechanics and Engineering (to appear), 1997. Also a TICAM Report 96-57, University of Texas, Austin, TX 78712.Google Scholar
- {15} R. Ferreira, B. Moon, J. Humphries, A. Sussman, J. Saltz, R. Miller, and A. Demarzo. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 449-453. American Medical Informatics Association, Hanley and Belfus, Inc., October 1997. Also available as University of Maryland Technical Report CS-TR-3777 and UMIACS-TR-97-35. Google ScholarDigital Library
- {16} I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. MK, 1999. Google ScholarDigital Library
- {17} Ian Foster, Carl Kesselman, and Steven Tuecke. The Anatomy of Grid: Enabling Scalable Virtual Organizations. International Journal of Supercomputing Applications, 2001. Google ScholarDigital Library
- {18} D. Gannon and A. Grimshaw. Object-based approaches. In I. Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pages 205-236. Morgan Kaufmann, 1999. Google ScholarDigital Library
- {19} Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Proceedings of the 1996 International Conference on Data Engineering, pages 152-159, February 1996. Google ScholarDigital Library
- {20} Andrew S. Grimshaw, William A. Wulf, James C. French, Alfred C. Weaver, and Paul F. Reynolds Jr. Legion: The next logical step toward a nationwide virtual computer. Technical Report CS-94-21, Department of Computer Science, University of Virginia, June 1994. Google Scholar
- {21} Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
- {22} High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.Google Scholar
- {23} Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, August 1992. Google ScholarDigital Library
- {24} Carsten Isert and Karsten Schwan. ACDS: Adapting computational data streams for high performance. In 14th International Parallel & Distributed Processing Symposium (IPDPS 2000), pages 641-646, Cancun, Mexico, May 2000. IEEE Computer Society Press. Google ScholarDigital Library
- {25} Ruoming Jin and Gagan Agrawal. A middleware for developing parallel data mining implementations. In Proceedings of the first SIAM conference on Data Mining, April 2001.Google ScholarCross Ref
- {26} S. M. Krishnamurthy. A brief survey on scheduling for pipelined processors. SIGPLAN Notices, 25(7):97-106, July 1990. Google ScholarDigital Library
- {27} Land Satellite Thematic Mapper (TM). http://edcwww.cr.usgs.gov/nsdi/html/landsat_tm/landsat_tm.Google Scholar
- {28} M. Livny. High throughput resource management. In I. Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pages 311-337. Morgan Kaufmann, 1999. Google ScholarDigital Library
- {29} W. Lorensen and H. Cline. Marching Cubes: A High Resoltion 3D Surface Reconstruction Algorithm. Computer Graphics, 21(4):163-169, 1987. Google ScholarDigital Library
- {30} Richard A. Luettich, Johannes J. Westerink, and Norman W. Scheffner. ADCIRC: An advanced three-dimensional circulation model for shelves, coasts, and estuaries. Technical Report 1, Department of the Army, U.S. Army Corps of Engineers, Washington, D.C. 20314-1000, December 1991.Google Scholar
- {31} Kwan-Liu Ma and Z.C. Zheng. 3D visualization of unsteady 2D airplane wake vortices. In Proceedings of Visualization'94, pages 124-31, Oct 1994. Google ScholarDigital Library
- {32} The Moderate Resolution Imaging Spectrometer. http://ltpwww.gsfc.nasa.gov/MODIS/MODIS.html.Google Scholar
- {33} NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/ LAND_BIO/origins.html.Google Scholar
- {34} Grid Physics Network. GriPhyN. http://www.griphyn.org.Google Scholar
- {35} Ron Oldfield. Summary of existing and developing data grids. White paper, Remote Data Access Group, Global Grid Forum, available from http://www.sdsc.edu/GridForum/RemoteData/Papers/papers.html.Google Scholar
- {36} G. Patnaik, K. Kailasnath, and E.S. Oran. Effect of gravity on flame instabilities in premixed gases. AIAA Journal, 29(12):2141-8, Dec 1991.Google ScholarCross Ref
- {37} Beth Plale and Karsten Schwan. dQUOB: Managing large data flows using dynamic embedded queries. In IEEE International High Performance Distributed Computing (HPDC), August 2000. Google ScholarDigital Library
- {38} Teragrid project partners. The TeraGrid: A Primer, September 2002. Available at www.teragrid.org.Google Scholar
- {39} U. Ramachandran, R. S. Nikhil, N. Harel, J. M. Rehg, and K. Knobe. Space-Time Memory: A Parallel Programming Abstraction for Interactive Multimedia Applications. In Proceedings of the Conference on Principles and Practices of Parallel Programming (PPoPP), pages 183-192. ACM Press, May 1999. Google ScholarDigital Library
- {40} T. Tanaka. Configurations of the solar wind flow and magnetic field around the planets with no magnetic field: calculation by a new MHD. Jounal of Geophysical Research, 98(A10):17251-62, Oct 1993.Google Scholar
- {41} William Thies, Michal Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of Conference on Compiler Construction (CC), April 2002. Google ScholarDigital Library
- {42} R. Y. Wang, A. Krishnamurthy, R. P. Martin, T. E. Abderson, and D. E. Culler. Modeling Communication Pipeline Latency. In Proceedings of the ACM SIGMETRICS Conference. ACM Press, June 1998. Google ScholarDigital Library
- {43} R. Wolski, N. Spring, and J. Hayes. The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Journal of Future Generation Computing Systems, 1998. Google ScholarDigital Library
- {44} M. T. Yang, R. Kasturi, and A. Sivasubramaniam. An Automatic Scheduler for Real-Time Vision Applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2001. Google ScholarDigital Library
- {45} K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Libit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency Practice and Experience, 9(11), November 1998.Google Scholar
Recommendations
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures
FCCM '14: Proceedings of the 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing MachinesCoarse-Grained Reconfigurable Architecture (CGRAs) are a promising parallel architecture with both high performance and high power-efficiency. Inner loop pipelining and outer loop merging techniques are usually used to improve the execution performance ...
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed ProcessingThe emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. Here, the computations associated with an application are carried out in several ...
Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures
Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops ...
Comments