skip to main content
10.1145/1250662.1250689acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

A 64-bit stream processor architecture for scientific applications

Published: 09 June 2007 Publication History

Abstract

Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. This paper first gives the design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing. The carrying out of 64-bit extension design and scientific computing oriented optimization are described in such aspects as instruction set architecture, stream controller, micro controller, ALU cluster, memory hierarchy and interconnection interface here. Second, two kinds of communications as message passing and stream communications are put forward. An interconnection based on the communications is designed for FT64-based high performance computers. Third, a novel stream programming language, SF95 (Stream FORTRAN95), and its compiler, SF95Compiler (Stream FORTRAN95 Compiler), are developed to facilitate the development of scientific applications. Finally, nine typical scientific application kernels are tested and the results show the efficiency of stream architecture for scientific computing.

References

[1]
Agrawal, S., Thies, W. and Amarasinghe, S. Optimizing stream programs using linear state space analysis. In CASES '05, 2005, 126--136.
[2]
Allen, J. R., Kennedy, K., Porterfield, C. and Warren, J. Conversion of Control Dependence to Data Dependence. In Conference Record of the Tenth ACM Symposium on Principles of Programming Languages, 1983, 177--189.
[3]
Bove, V. M. and Watlington, J. A. Cheops: A Reconfigurable Data-Flow System for Video Processing. IEEE Transactions on Circuits and Systems for Video Technology, 5, 2, (1995), 140--149.
[4]
Buck, I. Brook Spec v0.2. Report of Stanford University, http://merrimac.stanford.edu/brook/brookspec-v0.2.pdf, 2003.
[5]
Burger, D., Keckler, S. W., McKinley, K. S., Dahlin, M., John, L. K., Lin, C., Moore, C. R., Burrill, J., McDonald, R. G. and Yoder, W. Scaling to the End of Silicon with EDGE Architectures. Computer, 37, 7(2004), 44--55.
[6]
Caspi, E., DeHon, A. and Wawrzynek, J., A Streaming Multi-Threaded Model. In Proceedings of the Third Workshop on Media and Stream Processors (2001), 21--28.
[7]
Dally, W. J., Hanrahan, P., Erez, M. and Knight, T. J. Merrimac: Supercomputing with Streams. In SC2003, Nov 2003.
[8]
Dou, Y. and Lu, X. C. LEAP: A Data Driven Loop Engine on Array Processor. In APPT'03: Proceedings of the 5th International Workshop on Advanced Parallel Processing Technologies, 2003, 12--22.
[9]
Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., et al., A stream compiler for communication-exposed architectures. In ASPLOS-X (2002), 291--303.
[10]
Hoare, T. Communicating sequential processes. Communications of the ACM, 8, 21(1978), 666--677.
[11]
Kapasi, U., Dally, W. J., Rixner, S., Owens, J. D. and Khailany, B. The Imagine Stream Processor. In ICCD'02: Proceedings of 20th IEEE International Conference on Computer Design, 2002, 282--288.
[12]
Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D. and Towles, B. Stream Scheduling. In Proceedings of the 3rd Workshop on Media and Streaming Processors, 2001, 101--106.
[13]
Kapasi, U. J., Rixner, S., Dally, W. J., Khailany, B., Ahn, J. H., Mattson, P. and Owens, J. D. Programmable Stream Processors. IEEE Computer, 36, 8 (Feb, 2003), 54--62.
[14]
Kozyrakis, C. Scalable Vector Media-processors for Embedded Systems. PhD thesis, University of California at Berkeley, 2002.
[15]
Mattson, P. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002.
[16]
Mattson, P., Dally, W. J., Rixner, S., Kapasi, U. J. and Owens, J. D. Communication scheduling. SIGPLAN Not, 35, 11(2000), 82--92.
[17]
May, D. OCCAM. SIGPLAN Notices, 18, 4(1983), 69--79.
[18]
Owens, J., Kapasi, U., Mattson, P., Towles, B., Serebrin, B., Rixner, S. and Dally, W. Media processing applications on the Imagine stream processor. In ICCD'02(2002), 295--302.
[19]
Pham, D., Asano, S., Bolliger, M., Day, M. N., Hofstee, H. P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., J.Warnock, Weitzel, S., Wendel, D., Yamazaki, T. and K.Yazawa, a. The Design and Implementation of a First-Generation CELL Processor. In ISSCC'05, 2005, 184--185.
[20]
Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P. and Owens, J. D. Memory access scheduling. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, 2000, 128--138.
[21]
Rixner, S. Stream Processor Architecture. Kluwer Academic Publishers Group, ISBN: 0-7923-7545-9, 2002.
[22]
Taylor, M., Kim, J., Miller, J., Wentzlaff, D., et al., Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, 22,2 (2002), 25--35.
[23]
Thies, W., Karczmarek, M. and Amarasinghe, S. StreamIt: A Language for Streaming Applications. In Proceedings of the International Conference on Compiler Construction, 2002, 179--196.
[24]
Thies, W., Karczmarek, M., Gordon, M., Maze, D., Wong, J., H., H. o., Brown, M. and Amarasinghe, S. StreamIt: A Compiler for Streaming Applications. MIT-LCS Technical Memo TM-622, Cambridge, MA, http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TM-622.pdf, 2001.
[25]
Wen, M., Wu, N., Xun, C., Wu, W. and Zhang, C. Analysis and Performance Results of a Fluid Dynamics Application on MASA Stream Processor. In ICIS'06: Proceedings of International Conference on Information Systems, 2006, 350--354.
[26]
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P. and Yelick, K. The potential of the cell processor for scientific computing. In CF '06: Proceedings of the 3rd conference on Computing frontiers, 2006, 9--20.
[27]
Yang, X., Du, J., Yan, X. and Deng, Y. Matrix-Based Programming Optimization for Improving Memory Hierarchy Performance on Imagine. In ISPA'06, 2006.

Cited By

View all
  • (2023)Extension VM: Interleaved Data Layout in Vector MemoryACM Transactions on Architecture and Code Optimization10.1145/363152821:1(1-23)Online publication date: 7-Nov-2023
  • (2023)MLPs: Efficient Training of MiniGo on Large-scale Heterogeneous Computing System2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00068(475-482)Online publication date: Jan-2023
  • (2021)Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directionsCCF Transactions on High Performance Computing10.1007/s42514-020-00057-2Online publication date: 31-Mar-2021
  • Show More Cited By

Index Terms

  1. A 64-bit stream processor architecture for scientific applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
    June 2007
    542 pages
    ISBN:9781595937063
    DOI:10.1145/1250662
    • General Chair:
    • Dean Tullsen,
    • Program Chair:
    • Brad Calder
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
      May 2007
      527 pages
      ISSN:0163-5964
      DOI:10.1145/1273440
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. architecture
    2. compiler
    3. high performance computing
    4. program language
    5. scientific application
    6. stream processor

    Qualifiers

    • Article

    Conference

    SPAA07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Extension VM: Interleaved Data Layout in Vector MemoryACM Transactions on Architecture and Code Optimization10.1145/363152821:1(1-23)Online publication date: 7-Nov-2023
    • (2023)MLPs: Efficient Training of MiniGo on Large-scale Heterogeneous Computing System2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00068(475-482)Online publication date: Jan-2023
    • (2021)Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directionsCCF Transactions on High Performance Computing10.1007/s42514-020-00057-2Online publication date: 31-Mar-2021
    • (2020)Space‐address decoupled scratchpad memory management for neural network acceleratorsConcurrency and Computation: Practice and Experience10.1002/cpe.604633:6Online publication date: 13-Oct-2020
    • (2016)Iteration Interleaving--Based SIMD Lane PartitionACM Transactions on Architecture and Code Optimization10.1145/284725312:4(1-18)Online publication date: 4-Jan-2016
    • (2015)Atomic stream computation unit based on micro-thread level parallelism2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2015.7245700(25-29)Online publication date: Jul-2015
    • (2015)An introduction to CPU and DSP design in China中国处理器和数字信号处理芯片设计介绍Science China Information Sciences10.1007/s11432-015-5431-659:1(1-8)Online publication date: 30-Oct-2015
    • (2014)A memory schedule policy oriented to stream architecture2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910535(1-5)Online publication date: Aug-2014
    • (2014)FT-Matrix: A Coordination-Aware Architecture for Signal ProcessingIEEE Micro10.1109/MM.2013.12934:6(64-73)Online publication date: Nov-2014
    • (2013)Compiler-assisted leakage energy optimization of media applications on stream architecturesInternational Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2013.6523599(120-127)Online publication date: Mar-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media