Article

Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

Authors:
Wei Du

Ohio State University, Columbus

Ohio State University, Columbus
View Profile

,
Renato Ferreira

Universidade Federal de Minas Gerais, Brasil

Universidade Federal de Minas Gerais, Brasil
View Profile

,
Gagan Agrawal

Ohio State University, Columbus

Ohio State University, Columbus
View Profile

SC '03: Proceedings of the 2003 ACM/IEEE conference on SupercomputingNovember 2003https://doi.org/10.1145/1048935.1050159

Published:15 November 2003Publication History

SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing

ABSTRACT

The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes both pipelined and data parallelism to the compiler. Our compiler is responsible for selecting a set of candidate filter boundaries, determining the volume of communication required if a particular boundary is chosen, performing the decomposition, and generating code. We have developed a one-pass algorithm for determining the required communication between consecutive filters. We have developed a cost model for estimating the execution time for a given decomposition, and a dynamic programming algorithm for performing the decomposition. Detailed evaluation of our current compiler using four data-driven applications demonstrate the feasibility of our approach.

References

{1} Vikram Adve, Vinh Vi Lam, and Brian Ensink. Language and Compiler Support for Adaptive Distributed Applications. In Proceedings of the SIGPLAN workshop on Optimization of Middleware (OM) and Distributed Systems, June 2001. Google ScholarDigital Library
{2} Vikram Adve and John Mellor-Crummy. Using integer sets for data-parallel program analysis and optimization. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998. Google ScholarDigital Library
{3} Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.Google Scholar
{4} T. Arbogast, S. Bryant, C. Dawson, and M. F. Wheeler. Parssim: The parallel subsurface simulator, single phase. http://www.ticam.utexas.edu/~arbogast/parssim.Google Scholar
{5} D. Arnold, H. Casanova, and J. Dongarra. Innovation of the netsolve grid computing system. Concurrency Practice and Experience, 2002.Google ScholarCross Ref
{6} Adam Beguelin, Jack J. Dongarra, George Al Geist, Robert Manchek, and Keith Moore. HenCE: A Heterogenuous Network Computing Environment. Scientific Programming, 3(1):49-60, 1994. Google ScholarDigital Library
{7} F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Keselman, J. Mellor-Crummery, D. Reed, L. Torczon, and R. Wolski. The GrADS Project: Software Support for High-Level Grid Application Development. International Journal of High Performance Computing Applications, 15(4):327-344, 2001. Google ScholarDigital Library
{8} F. Berman, R. Wolski, H. Casanova, W. Cirne, H. Dail, M. Faerman, S. Figueira, J. Hayes, G. Obertelli, J. Schopf, G. Shao, S. Smallen, N. Spring, A. Su, and D. Zagorodnov. Adaptive Computing on the Grid Using AppLeS . IEEE Transactions on Parallel and Distributed Systems (to appear), 2003. Google ScholarDigital Library
{9} Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Chialin Chang, Alan Sussman, and Joel Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11):1457-1478, October 2001. Google ScholarDigital Library
{10} Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Alan Sussman, and Joel Saltz. A component-based implementation of iso-surface rendering for visualizing large datasets. Technical Report CS-TR-4249 and UMIACS-TR-2001-34, University of Maryland, Department of Computer Science and UMIACS, May 2001.Google Scholar
{11} Michael D. Beynon, Tahsin Kurc, Alan Sussman, and Joel Saltz. Optimizing execution of component-based applications using group instances. In Proceedings of the Conference on Cluster Computing and the Grid (CCGRID), pages 56-63. IEEE Computer Society Press, May 2001. Google ScholarDigital Library
{12} Francois Bodin, Peter Beckman, Dennis Gannon, Srinivas Narayana, and Shelby X. Yang. Distributed pC++: Basic ideas for an object parallel language. Scientific Programming, 2(3), Fall 1993.Google Scholar
{13} Fabian E. Bustamante, Greg Eisenhauer, Karsten Schwan, and Patrick Widener. Active Streams and the Effects of Stream Specialization. In Poster in Proc. of Tenth International Symposium on High Performance Distributed Computing (HPDC-2001). IEEE Computer Society Press, August 2001.Google Scholar
{14} Srinivas Chippada, Clint N. Dawson, Monica L. Martínez, and Mary F. Wheeler. A Godunov-type finite volume method for the system of shallow water equations. Computer Methods in Applied Mechanics and Engineering (to appear), 1997. Also a TICAM Report 96-57, University of Texas, Austin, TX 78712.Google Scholar
{15} R. Ferreira, B. Moon, J. Humphries, A. Sussman, J. Saltz, R. Miller, and A. Demarzo. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 449-453. American Medical Informatics Association, Hanley and Belfus, Inc., October 1997. Also available as University of Maryland Technical Report CS-TR-3777 and UMIACS-TR-97-35. Google ScholarDigital Library
{16} I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. MK, 1999. Google ScholarDigital Library
{17} Ian Foster, Carl Kesselman, and Steven Tuecke. The Anatomy of Grid: Enabling Scalable Virtual Organizations. International Journal of Supercomputing Applications, 2001. Google ScholarDigital Library
{18} D. Gannon and A. Grimshaw. Object-based approaches. In I. Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pages 205-236. Morgan Kaufmann, 1999. Google ScholarDigital Library
{19} Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Proceedings of the 1996 International Conference on Data Engineering, pages 152-159, February 1996. Google ScholarDigital Library
{20} Andrew S. Grimshaw, William A. Wulf, James C. French, Alfred C. Weaver, and Paul F. Reynolds Jr. Legion: The next logical step toward a nationwide virtual computer. Technical Report CS-94-21, Department of Computer Science, University of Virginia, June 1994. Google Scholar
{21} Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
{22} High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.Google Scholar
{23} Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, August 1992. Google ScholarDigital Library
{24} Carsten Isert and Karsten Schwan. ACDS: Adapting computational data streams for high performance. In 14th International Parallel & Distributed Processing Symposium (IPDPS 2000), pages 641-646, Cancun, Mexico, May 2000. IEEE Computer Society Press. Google ScholarDigital Library
{25} Ruoming Jin and Gagan Agrawal. A middleware for developing parallel data mining implementations. In Proceedings of the first SIAM conference on Data Mining, April 2001.Google ScholarCross Ref
{26} S. M. Krishnamurthy. A brief survey on scheduling for pipelined processors. SIGPLAN Notices, 25(7):97-106, July 1990. Google ScholarDigital Library
{27} Land Satellite Thematic Mapper (TM). http://edcwww.cr.usgs.gov/nsdi/html/landsat_tm/landsat_tm.Google Scholar
{28} M. Livny. High throughput resource management. In I. Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pages 311-337. Morgan Kaufmann, 1999. Google ScholarDigital Library
{29} W. Lorensen and H. Cline. Marching Cubes: A High Resoltion 3D Surface Reconstruction Algorithm. Computer Graphics, 21(4):163-169, 1987. Google ScholarDigital Library
{30} Richard A. Luettich, Johannes J. Westerink, and Norman W. Scheffner. ADCIRC: An advanced three-dimensional circulation model for shelves, coasts, and estuaries. Technical Report 1, Department of the Army, U.S. Army Corps of Engineers, Washington, D.C. 20314-1000, December 1991.Google Scholar
{31} Kwan-Liu Ma and Z.C. Zheng. 3D visualization of unsteady 2D airplane wake vortices. In Proceedings of Visualization'94, pages 124-31, Oct 1994. Google ScholarDigital Library
{32} The Moderate Resolution Imaging Spectrometer. http://ltpwww.gsfc.nasa.gov/MODIS/MODIS.html.Google Scholar
{33} NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/ LAND_BIO/origins.html.Google Scholar
{34} Grid Physics Network. GriPhyN. http://www.griphyn.org.Google Scholar
{35} Ron Oldfield. Summary of existing and developing data grids. White paper, Remote Data Access Group, Global Grid Forum, available from http://www.sdsc.edu/GridForum/RemoteData/Papers/papers.html.Google Scholar
{36} G. Patnaik, K. Kailasnath, and E.S. Oran. Effect of gravity on flame instabilities in premixed gases. AIAA Journal, 29(12):2141-8, Dec 1991.Google ScholarCross Ref
{37} Beth Plale and Karsten Schwan. dQUOB: Managing large data flows using dynamic embedded queries. In IEEE International High Performance Distributed Computing (HPDC), August 2000. Google ScholarDigital Library
{38} Teragrid project partners. The TeraGrid: A Primer, September 2002. Available at www.teragrid.org.Google Scholar
{39} U. Ramachandran, R. S. Nikhil, N. Harel, J. M. Rehg, and K. Knobe. Space-Time Memory: A Parallel Programming Abstraction for Interactive Multimedia Applications. In Proceedings of the Conference on Principles and Practices of Parallel Programming (PPoPP), pages 183-192. ACM Press, May 1999. Google ScholarDigital Library
{40} T. Tanaka. Configurations of the solar wind flow and magnetic field around the planets with no magnetic field: calculation by a new MHD. Jounal of Geophysical Research, 98(A10):17251-62, Oct 1993.Google Scholar
{41} William Thies, Michal Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of Conference on Compiler Construction (CC), April 2002. Google ScholarDigital Library
{42} R. Y. Wang, A. Krishnamurthy, R. P. Martin, T. E. Abderson, and D. E. Culler. Modeling Communication Pipeline Latency. In Proceedings of the ACM SIGMETRICS Conference. ACM Press, June 1998. Google ScholarDigital Library
{43} R. Wolski, N. Spring, and J. Hayes. The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Journal of Future Generation Computing Systems, 1998. Google ScholarDigital Library
{44} M. T. Yang, R. Kasturi, and A. Sivasubramaniam. An Automatic Scheduler for Real-Time Vision Applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2001. Google ScholarDigital Library
{45} K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Libit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency Practice and Experience, 9(11), November 1998.Google Scholar

Recommendations

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures
FCCM '14: Proceedings of the 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines

Coarse-Grained Reconfigurable Architecture (CGRAs) are a promising parallel architecture with both high performance and high power-efficiency. Inner loop pipelining and outer loop merging techniques are usually used to improve the execution performance ...
Read More
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing

The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. Here, the computations associated with an application are carried out in several ...
Read More
Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures

Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing
November 2003
859 pages
ISBN:1581136951
DOI:10.1145/1048935
General Chair:
James R. McGraw
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SC '03 Paper Acceptance Rate60of207submissions,29%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 233
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How

Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How

Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media