skip to main content
10.1145/1504176.1504195acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Comparability graph coloring for optimizing utilization of stream register files in stream processors

Published: 14 February 2009 Publication History

Abstract

A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated through the SRF (Stream Register File), a non-bypassing software-managed on-chip memory. Therefore, optimizing utilization of the SRF is crucial for good performance. The key insight is that the interference graphs formed by the streams in stream applications tend to be comparability graphs or decomposable into a set of multiple comparability graphs. We present a compiler algorithm that can find optimal or near-optimal colorings in stream IGs, thereby improving SRF utilization than the First-Fit
bin-packing algorithm, the best in the literature.

References

[1]
Preston Briggs, Keith D. Cooper, and Linda Torczon. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems, 16(3):428--455, 1994.
[2]
G. J. Chaitin. Register allocation & spilling via graph coloring. In SIGPLAN '82: Proceedings of the 1982 SIGPLAN symposium on Compiler construction, pages 98--101. ACM Press, 1982.
[3]
Fred C. Chow and John L. Hennessy. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst.,12 (4):501--536, 1990.
[4]
William J. Dally, Francois Labonte, Abhishek Das, Patrick Hanrahan, and Jung-Ho Ahn et al. Merrimac: Supercomputing with streams. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercom-puting, page 35. IEEE Computer Society, 2003.
[5]
Abhishek Das, William J. Dally, and Peter Mattson. Compiling for stream processing. In PACT '06: Proceedings of the 15th inter-national conference on Parallel architectures and compilation techniques, pages 33--42, New York, NY, USA, 2006. ACM.
[6]
Janet Fabri. Automatic storage optimization. SIGPLAN Not., 14(8): 83--91, 1979. ISSN 0362-1340.
[7]
Lal George and Andrew W. Appel. Iterated register coalescing. ACM Trans. Program. Lang. Syst., 18(3):300--324, 1996.
[8]
Jordan Gergov. Algorithms for compile-time memory optimization. In SODA '99: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pages 907--908, Philadelphia, PA, USA, 1999. Society for Industrial and Applied Mathematics.
[9]
Martin Charles Golumbic. Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57). North-Holland Publishing Co., Amsterdam, The Netherlands, The Netherlands, 2004.
[10]
R. Govindarajan and S. Rengarajan. Buffer allocation in regular dataflow networks: An approach based on coloring circular-arc graphs. In HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96), page 419, 1996.
[11]
H. A. Kierstead. A polynomial time approximation algorithm for Discrete Math., 3):231--237, 1991.
[12]
Francois Labonte, Peter Mattson, William Thies, Ian Buck, Christos Kozyrakis, and Mark Horowitz. The stream virtual machine. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 267--277, 2004.
[13]
Vincent Lefebvre and Paul Feautrier. Automatic storage management for parallel programs. Parallel Comput., 24(3-4):649--671, 1998.
[14]
Lian Li, Lin Gao, and Jingling Xue. Memory coloring: A compiler approach for scratchpad memory management. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 329--338, 2005.
[15]
Lian Li, Quan Hoang Nguyen, and Jingling Xue. Scratchpad allocation for data aggregates in superperfect graphs. In Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 207--216. ACM, 2007.
[16]
Lian Li, Hui Feng, Quan Hoang Nguyen, Lin Gao, and Jingling Xue. Compiler-directed scratchpad memory management via graph coloring. ACM Transactions on Architecture and Code Optimization, 2009. To appear.
[17]
John D. Owens. Computer Graphics on a Stream Architecture. PhD thesis, Stanford University, November 2002.
[18]
John D. Owens, Ujval J. Kapasi, Peter Mattson, Brian Towles, Ben Serebrin, Scott Rixner, and William J. Dally. Media processing applications on the imagine stream processor. In Proceedings of the IEEE International Conference on Computer Design, pages 295--302, September 2002.
[19]
Michael D. Smith, Norman Ramsey, and Glenn Holloway. A generalized algorithm for graph-coloring register allocation. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 277--288. ACM, 2004.
[20]
Michael Bedford Taylor and Jason Kim et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25--35, 2002.
[21]
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Ho, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications, 2001. MIT-LCS Technical Memo TM-622.
[22]
Li Wang, Xuejun Yang, Jingling Xue, Yu Deng, Xiaobo Yan, Tao Tang, and Quan Hoang Nguyen. Optimizing scientific application loops on stream processors. In LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, pages 161--170. ACM, 2008.
[23]
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. The potential of the cell processor for scientific computing. In CF '06: Proceedings of the 3rd conference on Computing frontiers, pages 9--20, New York, NY, USA, 2006. ACM.
[24]
Nan Wu, Mei Wen, Ju Ren, Yi He, and Chunyuan Zhang. Register allocation on stream processor with local register file. In ACSAC '06: Proceedings of the 11th Asia-Pacific Computer Systems Architecture Conference, pages 545--551, 2006.
[25]
Xuejun Yang, Xiaobo Yan, Zuocheng Xing, Yu Deng, Jiang Jiang, and Ying Zhang. A 64-bit stream processor architecture for scientific applications. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 210--219. ACM, 2007.
[26]
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, Gen Li, and Guibin Wang. Exploiting loop-dependent stream reuse for stream processors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 22--31, 2008.

Cited By

View all
  • (2012)WCET-aware data selection and allocation for scratchpad memoryACM SIGPLAN Notices10.1145/2345141.224842547:5(41-50)Online publication date: 12-Jun-2012
  • (2012)WCET-aware data selection and allocation for scratchpad memoryProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248425(41-50)Online publication date: 12-Jun-2012
  • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
  • Show More Cited By

Index Terms

  1. Comparability graph coloring for optimizing utilization of stream register files in stream processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
      February 2009
      322 pages
      ISBN:9781605583976
      DOI:10.1145/1504176
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 44, Issue 4
        PPoPP '09
        April 2009
        294 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1594835
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 February 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. comparability graph coloring
      2. software-managed cache
      3. stream processor
      4. stream programming

      Qualifiers

      • Research-article

      Conference

      PPoPP09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 230 of 1,014 submissions, 23%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)WCET-aware data selection and allocation for scratchpad memoryACM SIGPLAN Notices10.1145/2345141.224842547:5(41-50)Online publication date: 12-Jun-2012
      • (2012)WCET-aware data selection and allocation for scratchpad memoryProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248425(41-50)Online publication date: 12-Jun-2012
      • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
      • (2012)Single and multiple device DSA problems, complexities and online algorithmsTheoretical Computer Science10.1016/j.tcs.2011.11.005420(89-98)Online publication date: 1-Feb-2012
      • (2011)Loop fusion and reordering for register file optimization on stream processorsProceedings of the 2011 ACM Symposium on Applied Computing10.1145/1982185.1982306(560-565)Online publication date: 21-Mar-2011
      • (2010)Improving scratchpad allocation with demand-driven data tilingProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878942(127-136)Online publication date: 24-Oct-2010
      • (2014)Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisationSoftware—Practice & Experience10.1002/spe.221444:12(1485-1510)Online publication date: 1-Dec-2014
      • (2013)Energy-efficient stream task scheduling scheme for embedded multimedia applications on multi-issued stream architecturesJournal of Systems Architecture10.1016/j.sysarc.2013.03.01459:4-5(187-201)Online publication date: Apr-2013
      • (2012)Single and multiple device DSA problems, complexities and online algorithmsTheoretical Computer Science10.1016/j.tcs.2011.11.005420(89-98)Online publication date: 1-Feb-2012
      • (2010)Managing Data-Objects in Dynamically Reconfigurable CachesJournal of Computer Science and Technology10.1007/s11390-010-9320-625:2(232-245)Online publication date: 16-Mar-2010
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media