research-article

Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

Authors:

Sudhakar YalamanchiliAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 18, Issue 4

Article No.: 48, Pages 1 - 28

https://doi.org/10.1145/2504906

Published: 25 October 2013 Publication History

Abstract

Current heterogeneous chip-multiprocessors (CMPs) integrate a GPU architecture on a die. However, the heterogeneity of this architecture inevitably exerts different pressures on shared resource management due to differing characteristics of CPU and GPU cores. We consider how to efficiently share on-chip resources between cores within the heterogeneous system, in particular the on-chip network. Heterogeneous architectures use an on-chip interconnection network to access shared resources such as last-level cache tiles and memory controllers, and this type of on-chip network will have a significant impact on performance.

In this article, we propose a feedback-directed virtual channel partitioning (VCP) mechanism for on-chip routers to effectively share network bandwidth between CPU and GPU cores in a heterogeneous architecture. VCP dedicates a few virtual channels to CPU and GPU applications with separate injection queues. The proposed mechanism balances on-chip network bandwidth for applications running on CPU and GPU cores by adaptively choosing the best partitioning configuration. As a result, our mechanism improves system throughput by 15% over the baseline across 39 heterogeneous workloads.

References

[1]

Abts, D., Jerger, N. D. E., Kim, J., Gibson, D., and Lipasti, M. H. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York, 451--461.

Digital Library

[2]

AMD. 2011. AMD Accelerated ProcessingUnits. http://www.amd.com/us/products/technologies/apu/Pages/apu.aspx.

[3]

Ausavarungnirun, R., Loh, G., Chang, K., Subramanian, L., and Mutlu, O. 2012. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proceedings of the 34th Annual International Symposium on Computer Architecture. IEEE, 416--427.

Digital Library

[4]

Bakhoda, A., Kim, J., and Aamodt, T. M. 2010. Throughput-effective on-chip networks for manycore accelerators. In Proceedings of the 43rd International Symposium on Microarchitecture. IEEE, 421--432.

Digital Library

[5]

Beigné, E., Clermidy, F., Vivet, P., Clouard, A., and Renaudin, M. 2005. An asynchronous NOC architecture providing low latency service and its multi-level design framework. In Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'05). IEEE, 54--63.

Digital Library

[6]

Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of Network-on-chip. ACM Comput. Surv. 38, 1, Article 1.

Digital Library

[7]

Bjerregaard, T. and Sparsø, J. 2005. A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'05). IEEE, 1226--1231.

Digital Library

[8]

Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2004. QNoC: QoS architecture and design process for network on chip. J. Syst. Archit. 50, 2--3, 105--128.

Digital Library

[9]

Chang, D. W., Jenkins, C. D., et al. 2010. ERCBench: An open-source benchmark suite for embedded and reconfigurable computing. In Proceedings of the 20th International Conference on Field Programmable Logic and Applications (FPL'10). IEEE, 408--413.

Digital Library

[10]

Chang, K. K.-W., Ausavarungnirun, R., Fallin, C., and Mutlu, O. 2012. HAT: Heterogeneous adaptive throttling for on-chip networks. In Proceedings of the 24th International Symposium on Computer Architecture and High Performance (SBAC-PAD'12). IEEE, 9--18.

Digital Library

[11]

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer., J. W., Lee., S.-H., and Skadron, K. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10). IEEE, 44--54.

Digital Library

[12]

Choi, Y. and Pinkston., T. M. 2004. Evaluation of queue designs for true fully adaptive routers. J. Parallel Distrib. Comput. 64, 5, 606--616.

Digital Library

[13]

Dally, W. and Towles, B. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA.

Digital Library

[14]

Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, New York, 280--291.

Digital Library

[15]

Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2010. Aérgia: Exploiting packet latency slack in on-chip networks. In Proceedings of the 32nd annual International Symposium on Computer Architecture. ACM, New York, 106--116.

Digital Library

[16]

Dobkin, R. (Reuven), Vishnyakov, V., Friedman, E., and Ginosar, R. 2005. An asynchronous router for multiple service levels networks on chip. In Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'05). IEEE, 44--53.

Digital Library

[17]

Duato, J., Johnson, I., Flich, J., Naven, F., Javier, G. P., and Frinós, T. N. 2005. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In Proceedings of the 11st International Symposium on High Performance Computer Architecture. IEEE, 108--119.

Digital Library

[18]

Duato, J., Yalamanchili, S., and Ni, L. 1997. Interconnection Networks: An Engineering Approach 1st Ed. IEEE.

Digital Library

[19]

Evripidou, M., Nicopoulos, C., Soteriou, V., and Kim, J. 2012. Virtualizing virtual channels for increased network-on-chip robustness and upgradeability. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'12). IEEE, 21--26.

Digital Library

[20]

Goossens, K., Dielissen, J., Gangwal, O. P., Pestana, S. G., Radulescu, A., and Rijpkema, E. 2005b. A design flow for application-specific networks on chip with guaranteed performance to accelerate SOC design and verification. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'05). IEEE, 1182--1187.

Digital Library

[21]

Goossens, K., Dielissen, J., and Radulescu, A. 2005a. Æthereal network on chip: concepts, architectures, and implementations. IEEE Des. Test Comput. 22, 5, 414--421.

Digital Library

[22]

Goossens, K., Wielage, P., Peeters, A., and Van Meerbergen, J. 2002. Networks on silicon: Combining best-effort and guaranteed services. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'02). IEEE, 423--425.

Digital Library

[23]

Grot, B., Hestness, J., Keckler, S., W., and Mutlu, O. 2011. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. ACM, New York, 401--412.

Digital Library

[24]

Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, New York, 268--279.

Digital Library

[25]

Hansson, A., Subburaman, M., and Goossens, K. 2009. aelite: A flit-synchronous network on chip with composable and predictable services. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'09). European Design and Automation Association, Leuven, Belgium, 250--255.

Digital Library

[26]

Harmanci, M. D., Escudero, N. P., Leblebici, Y., and Ienne, P. 2005. Quantitative modelling and comparison of communication schemes to guarantee quality-of-service in networks-on-chip. In Proceedings of the IEEE International Symposium on Circuits and Systems, ISCAS'05, Vol. 2. IEEE, 1782--1785.

[27]

HPArch Research Group. 2011. MacSim. http://code.google.com/p/macsim/.

[28]

Intel. Haswell. http://www.intel.com/content/www/us/en/processors/core/4th-gen-core-processor-family. html.

[29]

Intel. Ivy Bridge. http://www.intel.com/content/www/us/en/silicon-innovations/intel-22nm-technology.html. Intel. Sandy Bridge. http://software.intel.com/en-us/articles/sandy-bridge/.

[30]

Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, S., Jr., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM, New York, 208--219.

Digital Library

[31]

Jaleel, A., Theobald, K. B., Steely, S. C., Jr., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 32nd annual International Symposium on Computer Architecture. ACM, New York, 60--71.

Digital Library

[32]

Jeong, M. K., Erez, M., Sudanthi, C., and Paver, N. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM, New York, 850--855.

Digital Library

[33]

Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). IEEE, 111--122.

Digital Library

[34]

Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th International Symposium on High Performance Computer Architecture. IEEE, 1--12.

[35]

Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd International Symposium on Microarchitecture. IEEE, 65--76.

Digital Library

[36]

Lai, M., Wang, Z., Gao, L., Lu, H., and Dai, K. 2008. A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers. In Proceedings of the 45th annual Design Automation Conference (DAC'08). ACM, New York, 630--633.

Digital Library

[37]

Lee, J. and Kim, H. 2012. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Proceedings of the 18th International Symposium on High Performance Computer Architecture. IEEE, 91--102.

Digital Library

[38]

Lee, J. W., Ng, M. C., and Asanovic, K. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the 30th Annual International Symposium on Computer Architecture. IEEE, 89--100.

Digital Library

[39]

Leung, L.-F. and Tsui, C.-Y. 2006. Optimal link scheduling on improving best-effort and guaranteed services performance in network-on-chip systems. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM, New York, 833--838.

Digital Library

[40]

Liang, J., Laffely, A., Srinivasan, S., and Tessier, R. 2004. An architecture and compiler for scalable on-chip communication. IEEE Trans. VLSI Syst. 12, 7, 711--726.

Digital Library

[41]

Liang, J., Swaminathan, S., and Tessier, R. 2000. aSOC: A scalable, single-chip communications architecture. In Proceedings of the 9th International Conference on Parallel Architectures and Compilation Techniques. IEEE, 37--46.

Digital Library

[42]

Marculescu, R., Ogras, U. Y., Peh, L.-S., Jerger, N. E., and Hoskote, Y. 2009. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 1, 3--21.

Digital Library

[43]

Marescaux, T. and Corporaal, H. 2007. Introducing the SuperGT network-on-chip; SuperGT QoS: More than just GT. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). ACM, New York, 116--121.

Digital Library

[44]

Millberg, M., Nilsson, E., Thid, R., and Jantsch, A. 2004. Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'04). IEEE, 890--895.

Digital Library

[45]

Mishra, A. K., Vijaykrishnan, N., and Das, C. R. 2011. A case for heterogeneous on-chip interconnects for CMPs. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. ACM, New York, 389--400.

Digital Library

[46]

Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M. T., and Moscibroda, T. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th International Symposium on Microarchitecture. ACM, 374--385.

Digital Library

[47]

Mutlu, O. and Moscibroda, T. 2007. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proceedings of the 40th International Symposium on Microarchitecture. IEEE, 146--160.

Digital Library

[48]

Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture. IEEE, 63--74.

Digital Library

[49]

Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the 39th International Symposium on Microarchitecture. IEEE, 208--222.

Digital Library

[50]

Nicopoulos, C. A., Park, D., Kim, J., Vijaykrishnan, N., Yousif, M. S., and Das, C. R. 2006. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In Proceedings of the 39th International Symposium on Microarchitecture. IEEE, 333--346.

Digital Library

[51]

Nilsson, E., Millberg, M., Öberg, J., and Jantsch, A. 2003. Load distribution with the proximity congestion awareness in a network on chip. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'03). IEEE, 11126--11127.

Digital Library

[52]

NVIDIA. Fermi: NVIDIA's Next Generation CUDA Compute Architecture. http://www.nvidia.com/fermi.

[53]

NVIDIA. Project Denver. http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-ofcomputing/.

[54]

Ogras, U. Y. and Marculescu, R. 2008. Analysis and optimization of prediction-based flow control in networks-on-chip. ACM Trans. Des. Autom. Electron Syst. 13, 1, Article 11.

Digital Library

[55]

Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. 2004. Pinpointing representative portions of large Intel R Itanium R programs with dynamic instrumentation. In Proceedings of the 37th International Symposium on Microarchitecture. IEEE, 81--92.

Digital Library

[56]

Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, 381--391.

Digital Library

[57]

Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th International Symposium on Microarchitecture. IEEE, 423--432.

Digital Library

[58]

Rijpkema, E., Goossens, K. G. W., Radulescu, A., Dielissen, J., Van Meerbergen, J., Wielage, P., and Waterlander, E. 2003. Trade offs in the design of a router with both guaranteed and best-effort services for networks on chip. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'03). IEEE, 10350--10355.

Digital Library

[59]

Srikantaiah, S., Kandemir, M., and Wang, Q. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, New York, 517--528.

Digital Library

[60]

Stefan, R., Molnos, A., and Goossens, K. 2012. dAElite: A TDM NoC supporting QoS, multicast, and fast connection set-up. IEEE Trans. Comput. 99, PrePrints.

[61]

Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High Performance Computer Architecture. IEEE, 117--128.

Digital Library

[62]

Suh, G. E., Rudolph, L., and Devadas, S. 2004. Dynamic partitioning of shared cache memory. J. Supercomputing 28, 1, 7--26.

Digital Library

[63]

Tamir, Y. and Frazier, G. L. 1992. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE Trans. Comput. 41, 6, 725--737.

Digital Library

[64]

Taylor, M. B., Kim, J., et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35.

Digital Library

[65]

The IMPACT Research Group, UIUC. Parboil Benchmark Suite. http://impact.crhc.illinois.edu/parboil.php.

[66]

Triviño, F., Sánchez, J. L., Alfaro, F. J., and Flich, J. 2012. Exploring NoC virtualization alternatives in CMPs. In Proceedings of the 20th Euromicro International Conf. on Parallel, Distributed and Network-Based Processing (PDP'12). IEEE, 473--482.

Digital Library

[67]

van den Brand, J. W., Ciordas, C., Goossens, K., and Basten, T. 2007. Congestion-controlled best-effort communication for networks-on-chip. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'07). EDA Consortium, San Jose, CA, 948--953.

Digital Library

[68]

Varatkar, G. and Marculescu, R. 2002. Traffic analysis for on-chip networks design of multimedia applications. In Proceedings of the 39th Annual Design Automation Conference (DAC'02). ACM, New York, 795--800.

Digital Library

[69]

Weber, W.-D., Chou, J., Swarbrick, I., and Wingard, D. 2005. A quality-of-service mechanism for interconnection networks in system-on-chips. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'05). IEEE, 1232--1237.

Digital Library

[70]

Xie, Y. and Loh, G. H. 2009. PIPP: promotion(insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York, 174--183.

Digital Library

[71]

Yang, Y., Xiang, P., Mantor, M., and Zhou, H. 2012. CPU-assisted GPGPU on fused CPU-GPU architectures. In Proceedings of the 18th International Symposium on High Performance Computer Architecture. IEEE, 103--114.

Digital Library

[72]

Yuan, G. L., Bakhoda, A., and Aamodt, T. M. 2009. Complexity effective memory access scheduling for many-core accelerator architectures. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, New York, 34--44.

Digital Library

Cited By

Alaei MYazdanpanah F(2024)A Survey on Heterogeneous CPU–GPU Architectures and SimulatorsConcurrency and Computation: Practice and Experience10.1002/cpe.831837:1Online publication date: 30-Oct-2024
https://doi.org/10.1002/cpe.8318
Rout SM BSinha MDeb S(2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3203611
Fang JCheng HWei ZYang H(2023)DPBC-VCP: A Network-On-Chip Prioritization Mechanism Combined with VCP for CPU-GPU Heterogeneous Systems2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00264(1927-1934)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00264
Show More Cited By

Index Terms

Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Networks
  1. Network architectures

Recommendations

Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture

Incorporating a GPU architecture into CMP, which is more efficient with certain types of applications, is a popular architecture trend in recent processors. This heterogeneous mix of architectures will use an on-chip interconnection to access shared ...
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

Future chip multiprocessors (CMPs) may have hundreds to thousands of threads competing to access shared resources, and will require quality-of-service (QoS) support to improve system utilization. Although there has been significant work in QoS support ...
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture

Future chip multiprocessors (CMPs) may have hundreds to thousands of threads competing to access shared resources, and will require quality-of-service (QoS) support to improve system utilization. Although there has been significant work in QoS support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 18, Issue 4

Special Section on Networks on Chip: Architecture, Tools, and Methodologies

October 2013

380 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2541012

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 25 October 2013

Accepted: 01 July 2013

Revised: 01 June 2013

Received: 01 January 2013

Published in TODAES Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
463
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alaei MYazdanpanah F(2024)A Survey on Heterogeneous CPU–GPU Architectures and SimulatorsConcurrency and Computation: Practice and Experience10.1002/cpe.831837:1Online publication date: 30-Oct-2024
https://doi.org/10.1002/cpe.8318
Rout SM BSinha MDeb S(2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3203611
Fang JCheng HWei ZYang H(2023)DPBC-VCP: A Network-On-Chip Prioritization Mechanism Combined with VCP for CPU-GPU Heterogeneous Systems2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00264(1927-1934)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00264
Fang JWei ZLiu YHou Y(2023)A Task-Based Routing Algorithm for Network-on-Chip in Heterogeneous CPU-GPU Architectures2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00110(758-763)Online publication date: 17-Dec-2023
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00110
Li YLouri A(2021)ALPHA: A Learning-Enabled High-Performance Network-on-Chip Router Design for Heterogeneous Manycore ArchitecturesIEEE Transactions on Sustainable Computing10.1109/TSUSC.2020.29813406:2(274-288)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TSUSC.2020.2981340
Zheng HWang KLouri A(2021)Adapt-NoC: A Flexible Network-on-Chip Design for Heterogeneous Manycore Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00066(723-735)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00066
Wen HZhang W(2020)Denial of Service in CPU-GPU Heterogeneous Architectures2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286228(1-5)Online publication date: 22-Sep-2020
https://doi.org/10.1109/HPEC43674.2020.9286228
Morgan AHassan AEl-Kharashi MTawfik A(2020) NoC 2 : An Efficient Interfacing Approach for Heavily-Communicating NoC-Based Systems IEEE Access10.1109/ACCESS.2020.30306068(185992-186011)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3030606
Wen HZhang W(2019)Improving Parallelism of Breadth First Search (BFS) Algorithm for Accelerated Performance on GPUs2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916551(1-7)Online publication date: Sep-2019
https://doi.org/10.1109/HPEC.2019.8916551
Wen HZhang W(2019)Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916239(1-6)Online publication date: Sep-2019
https://doi.org/10.1109/HPEC.2019.8916239
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents