skip to main content
10.1145/2000064.2000078acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

TLSync: support for multiple fast barriers using on-chip transmission lines

Published: 04 June 2011 Publication History

Abstract

As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).
This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.

Supplementary Material

JPG File (isca_3b_1.jpg)
MP4 File (isca_3b_1.mp4)

References

[1]
J. L. Abellán, J. Fernández, and M. E. Acacio. Efficient and scalable barrier synchronization for many-core cmps. In 7th ACM Intl. Conf. on Computing frontiers, pages 73--74, 2010.
[2]
Advanced Design System. Agilent Technologies, Santa Clara CA, USA, 2010.
[3]
G. Almási, C. Archer, J. G. Castaños, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM J. Res. Dev., 49:393--406, 2005.
[4]
B. Beck, B. Kasten, and S. Thakkar. VLSI assist for a multiprocessor. In second Intl. Conf. on Architectual Support for Prog. Lang. and Operating Sys., pages 10--20, 1987.
[5]
B. M. Beckmann and D. A. Wood. TLC: Transmission Line Caches. In 36th annual IEEE/ACM Intl. Symp. on Microarchitecture, pages 43--54, 2003.
[6]
C. J. Beckmann and C. D. Polychronopoulos. Fast barrier synchronization hardware. In 1990 ACM/IEEE Conf. on Supercomputing, pages 180--189, 1990.
[7]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In 17th Int'l. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2008.
[8]
J. Borremans, S. Thijs, M. Dehan, A. Mercha, and P. Wambacq. Low-cost feedback-enabled LNAs in 45nm CMOS. In Proc. of ESSCIRC '09, pages 100--103, 2009.
[9]
A. Carpenter, J. Hu, J. Xu, M. Huang, and H. Wu. A case for globally shared-medium on-chip interconnect. In 38th annual Intl. Symp. on Computer Architecture, 2011.
[10]
M.-C. F. Chang, J. Cong, A. Kaplan, C. Liu, M. Naik, J. Premkumar, G. Reinman, E. Socher, and S.-W. Tam. Power reduction of cmp communication networks via rf-interconnects. In 41st annual IEEE/ACM Intl. Symp. on Microarchitecture, pages 376--387, 2008.
[11]
Cray Research, Inc. CRAY T3D System Architecture Overview, 1993.
[12]
E. M. B. Consortium. EEMBC benchmark. www.eembc.org.
[13]
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU ultracomputer - designing a MIMD, shared-memory parallel machine. In 9th Intl. Symp. on Computer Architecture, pages 27--42, 1982.
[14]
W. T.-Y. Hsu and P.-C. Yew. An effective synchronization network for hot-spot accesses. ACM Trans. Comput. Syst., 10:167--189, 1992.
[15]
Intl. Technology Roadmap for Semiconductors. ITRS - 2008 update, 2008. http://www.itrs.net.
[16]
S. W. Keckler, W. J. Dally, D. Maskit, N. P. Carter, A. Chang, and W. S. Lee. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In 25th annual Intl. Symp. on Computer architecture, pages 306--317, 1998.
[17]
T. Krishna, A. Kumar, P. Chiang, M. Erez, and L.-S. Peh. NoC with near-ideal express virtual channels using global-line communication. Symp. on High-Performance Interconnects, pages 11--20, 2008.
[18]
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. In fourth annual ACM Symp. on Parallel Algorithms and Architectures, pages 272--285, 1992.
[19]
Microwave Office. Applied Wave Research, El Segundo, CA, USA, 2010.
[20]
R. Nanjegowda, O. Hernandez, B. Chapman, and H. H. Jin. Scalability evaluation of barrier algorithms for openmp. In 5th Int'l Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism, pages 42--52, 2009.
[21]
T. Peters. Livermore loops coded in c, 1992. http://www.netlib.org/benchmark/livermorec.
[22]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, 2005. http://sesc.sourceforge.net.
[23]
J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In 39th IEEE/ACM Int'l Symp. on Microarchitecture, pages 235--246, 2006.
[24]
J. Sartori and R. Kumar. Low-overhead, high-speed multi-core barrier synchronization. In 5th Intl. Conf. on High Performance Embedded Architectures and Compilers, pages 18--34, 2010.
[25]
O. Schmitz, S. Hampel, C. Orlob, M. Tiebout, and I. Rolfes. Body effect up- and down-conversion mixer circuits for low-voltage ultra-wideband operation. Analog Integrated Circuits and Signal Processing, 64:233--240, 2010.
[26]
S. L. Scott. Synchronization and communication in the T3E multiprocessor. In seventh Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 26--36, 1996.
[27]
S. Shang and K. Hwang. Distributed hardwired barrier synchronization for scalable multiprocessor clusters. IEEE Trans. on Parallel and Distributed Systems, 6:591--605, 1995.
[28]
M. Sinha, S. Hsu, A. Alvandpour, W. Burleson, R. Krishnamurthy, and S. Borkar. High-performance and low-voltage sense-amplifier techniques for sub-90nm SRAM. In IEEE Intl. Conf. on Systems-on-Chip, pages 113--116, 2003.
[29]
Y. Sun, C. Jeong, I. Lee, J. Lee, and S. Lee. A 50-300-MHz low power and high linear active RF tracking filter for digital TV tuner ICs. In 2010 IEEE Custom Integrated Circuits Conf. (CICC), pages 1--4, 2010.
[30]
A. Valdes-Garcia, R. Venkatasubramanian, R. Srinivasan, J. Silva-Martinez, and E. Sanchez-Sinencio. A CMOS RF RMS detector for built-in testing of wireless transceivers. In Proc. of 23rd IEEE VLSI Test Symp., pages 249--254, 2005.
[31]
Y. Zheng and C. Saavedra. Ultra-compact MMIC active bandpass filter with wide tuning range. Electronics Letters, 44(6):424--425, 2008.

Cited By

View all
  • (2023)DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory OperationsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589065(1-13)Online publication date: 17-Jun-2023
  • (2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
  • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
  • Show More Cited By

Index Terms

  1. TLSync: support for multiple fast barriers using on-chip transmission lines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
    June 2011
    488 pages
    ISBN:9781450304726
    DOI:10.1145/2000064
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
      ISCA '11
      June 2011
      462 pages
      ISSN:0163-5964
      DOI:10.1145/2024723
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. barrier
    2. multi-core
    3. synchronization
    4. transmission line

    Qualifiers

    • Research-article

    Conference

    ISCA '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory OperationsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589065(1-13)Online publication date: 17-Jun-2023
    • (2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
    • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
    • (2020)Modeling of 300 GHz Chip-to-Chip Wireless Channels in Metal EnclosuresIEEE Transactions on Wireless Communications10.1109/TWC.2020.297120619:5(3214-3227)Online publication date: May-2020
    • (2019)ReplicaProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304033(849-863)Online publication date: 4-Apr-2019
    • (2018)Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618541(1-6)Online publication date: Sep-2018
    • (2018)High Swing Pulse-Amplitude Modulation of Transmission Line Links for On-Chip Communication2018 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2018.8351267(1-5)Online publication date: May-2018
    • (2017)On-Chip Networks, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00772ED1V01Y201704CAC04012:3(1-210)Online publication date: 17-Jun-2017
    • (2017)HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-ChipScience China Information Sciences10.1007/s11432-016-0306-y60:6Online publication date: 9-Feb-2017
    • (2016)RacerThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195678(1-13)Online publication date: 15-Oct-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media