skip to main content
10.1145/2134254.2134277acmotherconferencesArticle/Chapter ViewAbstractPublication PagesisecConference Proceedingsconference-collections
research-article

Identifying hotspots in a program for data parallel architecture: an early experience

Published: 22 February 2012 Publication History

Abstract

In applications that rely on data intensive computation, one can gain significant performance if the source code is suitably transformed for parallel hardware. A common approach is to identify the loops inside the program that consume a significant amount of time, that we call hotspots. One of the impending business need here is to quickly identify such loops for further transformation. However, the exact identification of such hotspots requires an elaborate runtime analysis. When we deal with a third party business application, only a partial version of the source code is available, with limited test inputs, which hinders a correct runtime analysis. Therefore, we resort to static analysis of source code to get a conservative loop iteration count. In this paper we describe our approach to analyze a source code to find hotspots. Our approach is based on estimating the iteration count of a loop using the polytope model for volume computation. This is then combined with the cyclomatic complexity measurement of the loop body. Both these metrics together provides an approximate idea of hotspots in a program and serves as a code transformation clue to programmers. We have run our tool on Rodinia benchmark applications and found encouraging results.

References

[1]
International technology roadmap for semiconductors. Executive summary, 2005 and 2007.
[2]
S. K. Abd-El-Hafiz and V. R. Basili. A knowledge-based approach to the analysis of loops. IEEE Transactions on Software Engineering, 22:339--360, 1996.
[3]
S. V. Adve, V. S. Adve, G. Agha, M. I. Frank, M. J. Garzarán, J. C. Hart, W. mei W. Hwu, R. E. Johnson, L. Kale, R. Kumar, D. Marinov, K. Nahrstedt, D. Padua, M. Parthasarathy, S. Patel, G. Rosu, D. Roth, M. Snir, J. Torrellas, and C. Zilles. Parallel@illinois. Technical report, University of Illinois at Urbana-Champaign, 2008.
[4]
C. Ancourt and F. Irigoin. Scanning Polyhedra with DO Loops. In 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39--50, 1991.
[5]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.
[6]
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Communications of ACM, 52:56--67, October 2009.
[7]
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ((PPoPP). ACM SIGPLAN, 2008.
[8]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 2004.
[9]
M. Beck and S. Robins. Computing the Continuous Discretely, Integer-point enumeration in polyhedra. Springer-Verlag, 2007.
[10]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 45--54. IEEE Computer Society, 2009.
[11]
C. Cullmann and F. Martin. Data-Flow Based Detection of Loop Bounds. In Proc. of WCET, 2007.
[12]
M. R. de Alba and D. R. Kaeli. Runtime predictability of loops. In Proceedings of the Workload Characterization, pages 91--98. IEEE Computer Society, 2001.
[13]
A. Ermedahl and J. Gustafsson. Deriving Annotations for Tight Calculation of Execution Time. In Proc. of Euro-Par, 1997.
[14]
D. Fandrey. Clang/LLVM Maturity Report. Technical report, Computer Science Dept., University of Applied Sciences Karlsruhe, June 2010.
[15]
P. N. Glaskowsky. NVidia fermi: The first complete GPU computing architecture. Technical report, 2009.
[16]
D. Kuck. The structure of computers and computations. John Wiley & Sons, 1978.
[17]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the international symposium on Code generation and optimization, pages 75--. IEEE Computer Society, 2004.
[18]
P. Lokuciejewski, D. Cordes, H. Falk, and P. Marwedel. A fast and precise static loop analysis based on abstract interpretation, program slicing and polytope models. In International Symposium on Code Generation and Optimization, pages 136--146. IEEE Computer Society, 2009.
[19]
T. Mattson and K. Keutzer. Patterns for parallel programming. Technical report, UCB EECS, 2008.
[20]
T. J. McCabe and A. H. Watson. Software Complexity. Crosstalk, Journal of Defense Software Engineering, 7(12):5--9, December 1994.
[21]
J. Nickolls and W. J. Dally. The GPU computing era. IEEE Micro, 30(2), 2010.
[22]
D. Patterson. The trouble with multi-core. IEEE Spectrum, 2010.
[23]
D. Patterson and J. Hennessy. Graphics and Computing GPUs: Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, 2009.
[24]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop Transformations: Convexity, Pruning and Optimization. In POPL'11, pages 549--561. ACM, 2011.
[25]
A. Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1986.
[26]
J. Sjödin, S. Pop, H. Jagasia, T. Grosser, and A. Pop. Design of Graphite and the Polyhedral Compilation Package. In Proc. GCC Developers' Summit, pages 113--123, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISEC '12: Proceedings of the 5th India Software Engineering Conference
February 2012
174 pages
ISBN:9781450311427
DOI:10.1145/2134254
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IITK: Indian Institute of Technology Kanpur

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. compilers
  3. loop analysis
  4. polytope model
  5. static analysis

Qualifiers

  • Research-article

Conference

ISEC '12
Sponsor:
  • IITK
ISEC '12: India Software Engineering Conference 2012
February 22 - 25, 2012
Kanpur, India

Acceptance Rates

ISEC '12 Paper Acceptance Rate 26 of 107 submissions, 24%;
Overall Acceptance Rate 76 of 315 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 157
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media