research-article

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

Authors:
Mohammadsadegh Sadri

University of Bologna, Italy

University of Bologna, Italy
View Profile

,
Christian Weis

University of Kaiserslautern, Germany

University of Kaiserslautern, Germany
View Profile

,
Norbert Wehn

University of Kaiserslautern, Germany

University of Kaiserslautern, Germany
View Profile

,
Luca Benini

University of Bologna, Italy

University of Bologna, Italy
View Profile

FPGAworld '13: Proceedings of the 10th FPGAworld ConferenceSeptember 2013Article No.: 5Pages 1–8https://doi.org/10.1145/2513683.2513688

Published:10 September 2013Publication History

FPGAworld '13: Proceedings of the 10th FPGAworld Conference

Pages 1–8

ABSTRACT

Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as target and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1.6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1.2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2.5 nJ (> 20%) is shown for each byte of processed data. This is the first work which represents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.

References

L. Benini, E. Flamand, D. Fuin, and D. Melpignano. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 983--987, 2012. Google ScholarDigital Library
T. Berg. Maintaining i/o data coherence in embedded multicore systems. Micro, IEEE, 29(3):10--19, 2009. Google ScholarDigital Library
C. Cascaval, S. Chatterjee, H. Franke, K. Gildea, and P. Pattnaik. A taxonomy of accelerator architectures and their programming models. IBM Journal of Research and Development, 54(5):5:1--5:10, 2010. Google ScholarDigital Library
J. Choi, K. Nam, A. Canis, J. Anderson, S. Brown, and T. Czajkowski. Impact of cache architecture and interface on performance and area of fpga-based processor/parallel-accelerator systems. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on, pages 17--24, 2012. Google ScholarDigital Library
F. Clermidy, C. Bernard, R. Lemaire, J. Martin, I. Miro-Panades, Y. Thonnart, P. Vivet, and N. Wehn. Magali: A network-on-chip based multi-core system-on-chip for mimo 4g sdr. In IC Design and Technology (ICICDT), 2010 IEEE International Conference on, pages 74--77, 2010.Google ScholarCross Ref
C. Fajardo, Z. Fang, R. Iyer, G. Garcia, S. E. Lee, and L. Zhao. Buffer-integrated-cache: A cost-effective sram architecture for handheld and embedded platforms. In Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pages 966--971, 2011. Google ScholarDigital Library
P. Greenhalgh. big.little processing with arm cortex-a15 & cortex-a7. september 2011.Google Scholar
Altera. Inc. Adding hardware accelerators to reduce power in embedded systems. september 2009.Google Scholar
ARM. Inc. Introducing neon development, 2009. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/BABCJFDG.html.Google Scholar
ARM. Inc. Cortex-A9 MPCore Technical Reference Manual, 2012. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0460c/CIAIIJCE.html.Google Scholar
ARM. Inc. AMBA AXI and ACE Protocol Specification, February 2013. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022e/index.html.Google Scholar
Synopsys. Inc. DesignWare DDR3/2 SDRAM Memory Controller, 2013. http://www.synopsys.com/dw/ipdir.php?ds=dwc_ddr3_mem.Google Scholar
Xilinx. Inc. LogiCORE IP AXI Master Burst (DS844), June 2011. http://www.xilinx.com/support/documentation/ip_documentation/axi_master_burst/v1_00_a/ds844_axi_master_burst.pdf.Google Scholar
Xilinx. Inc. LogiCORE IP ChipScope AXI Monitor (DS810), March 2011. http://www.xilinx.com/support/documentation/ip_documentation/chipscope_axi_monitor/v2_00_a/ds810_chipscope_axi_monitor.pdf.Google Scholar
Xilinx. Inc. ZC-702 Evaluation Board for the Zynq-7000 XC7Z020 All Programmable SoC, April 2013. http://www.xilinx.com/support/documentation/boards_and_kits/zc702_zvik/ug850-zc702-eval-bd.pdf.Google Scholar
Xilinx. Inc. Zynq-7000 All Programmable SoC Technical Reference Manual (UG585), March 2013. http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf.Google Scholar
S. Ishikawa, A. Tanaka, and T. Miyazaki. Hardware accelerator for blast. In Embedded Multicore Socs (MCSoC), 2012 IEEE 6th International Symposium on, pages 16--22, 2012. Google ScholarDigital Library
S. Kaxiras and A. Ros. Efficient, snoopless, system-on-chip coherence. In SOC Conference (SOCC), 2012 IEEE International, pages 230--235, 2012.Google ScholarCross Ref
A. Kennedy, X. Wang, and B. Liu. Energy efficient packet classification hardware accelerator. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--8, 2008. Google ScholarDigital Library
G. Kyriazis. Heterogeneous system architecture: A technical review. Technical report, Advanced Micro Devices, August 2012.Google Scholar
S. Lafond and J. Lilius. Interrupt costs in embedded system with short latency hardware accelerators. In Engineering of Computer Based Systems, 2008. ECBS 2008. 15th Annual IEEE International Conference and Workshop on the, pages 317--325, 2008. Google ScholarDigital Library
J. Levon, M. Johnson, et al. Oprofile: A system profiler for linux. "http://oprofile.sourceforge.net/.Google Scholar
O. Mencer. Maximum performance computing for exascale applications. In Embedded Computer Systems (SAMOS), 2012 International Conference on, pages iii--iii, 2012.Google Scholar
M. Nadeem, S. Wong, G. Kuzmanov, and A. Shabbir. A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in h.264/avc. In Embedded Systems for Real-Time Multimedia, 2009. ESTIMedia 2009. IEEE/ACM/IFIP 7th Workshop on, pages 18--27, 2009.Google ScholarCross Ref
M. O'Connor. Accelerated processing and the fusion system architecture. In Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific, pages 93--93, 2012.Google ScholarCross Ref
M. Sadri. Technical report: Energy and performance exploration of accelerator coherency port using xilinx zynq. Technical report, Department of Electrical, Electronic and Information Engineering, University of Bologna, May 2013.Google Scholar
N. C. Stephane Eric Sebastien Brochier. Managing the storage of data in coherent data stores, 09 2009.Google Scholar
T. Suh, D. Blough, and H.-H. Lee. Supporting cache coherence in heterogeneous multiprocessor systems. In Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings, volume 2, pages 1150--1155 Vol.2, 2004. Google ScholarDigital Library

Index Terms

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs

Recommendations

The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc
Read More
Rapid Implementation of Embedded Systems using Xilinx Zynq Platform
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference

In any digital system design, it is crucial to achieve the lowest time-to-market possible. Indeed, that need has pushed large FPGA manufacturers to produce SoCs which will implement reprogrammable logic along with CPU and DSP cores. Especially, during ...
Read More
HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis
HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

This paper presents an implementation of an Orthogonal Frequency-Division Multiplexing (OFDM) receiver using the high-level synthesis tool, from Xilinx called Software Defined System-on-Chip (SDSoC). The Zynq SoCs containing an ARM processor besides a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGAworld '13: Proceedings of the 10th FPGAworld Conference
September 2013
75 pages
ISBN:9781450324960
DOI:10.1145/2513683
General Chair:
Lennart Lindh
FPGAworld, Sweden
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 September 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AXI masters
ZYQN
accelerator coherency port
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate4of6submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 978
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

FPGAworld '13: Proceedings of the 10th FPGAworld Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc

Rapid Implementation of Embedded Systems using Xilinx Zynq Platform

HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

FPGAworld '13: Proceedings of the 10th FPGAworld Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc

Rapid Implementation of Embedded Systems using Xilinx Zynq Platform

HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media