Abstract
C-Slow Retiming (CSR) generates concentric design clusters and improves the performance per area factor of a design by reusing the combinatorial logic in a time sliced fashion. The limitation of CSR is, that all C copies of the design have to be continuously executed. The paper proposes System Hyper Pipelining (SHP), which overcomes the limitations of CSR by adding thread stalling, bypassing and fork-join queueing techniques. The impact of SHP on multithreading and multiprocessing system is manifold. This paper concentrates on techniques to improve the performance of individual threads of SHP based CPUs. SHP is ideal for FPGAs with their high number of registers and their flexible memory usage. The paper compares standard implementations of CPUs with their CSR and SHP versions. Results based on three state-of-the-art 32-bit processors are shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdelfattah, M.S., Betz, V.: The power of communication: Energy- efficient NoCs for FPGAs. In: Intern. Conf. on FPL, pp. 1–8. Porto, Portugal September 2-4, 2013
Matthews, E., Shannon, L., Dedorova, A.: Polyblaze: from one to many. Bringing the microblaze into the multicore era with linux SMP support. In: 22nd Intern. Conf. On FPL, pp. 224–230. Oslo, Norway August 29-31, 2012
Vallina, F.M., Jachimiec, N., Saniie, J.: Multiprocessor and operating system design for signal processing on an FPGA. In: IEEE Intl. Conf. on Electro/ Information Technology, pp. 378–383. Chicago, IL, USA May 17-20, 2007
Klimm, A., Braun, L., Becker, J.: An adaptive and scalable multiprocessor system for xilinx FPGAs using minimal sized processor cores. In: IEEE Inter. Symposium on Parallel and Ditributed Processing, pp. 1–7. Miami, Fl, USA April 14-18, 2008
Wallentowitz, S., Lankes, A., Zaib, A., Wild, T., Herkersdorf, A.: A framework for open tiled manycore system-on-chip. In: 22nd Intern. Conf. on FPL, pp.535–538. Oslo, Norway August 29-31, 2012
Henrey, M., Edmond, S., Shannon, L., Menon, C.: Bio-inspired walking: A FPGA multicore system for a legged robot. In: 22nd Inter. Conf. on FPL, pp. 105–111. Oslo, Norway August 29-31, 2012
Lu, Y., Sezer, S., McCanny, J.: Advanced multithreading architecture with hardware based thread scheduling. In: Inter. Conf. on FPL, pp. 95–100. Milano, Italy 31 August –2 September 2010
Tatas, K., Kyriacou, C.: Implementation of a threaded dataflow multiprocessor using FPGA. In: 6th Intern. Conf. on DTIS, pp. 1–6. Athens, Greece April 6-8, 2011
Labrecque, M., Steffan, J.G.: Improving pipelined soft processors with multitherading. In: Intern. Conf. on FPL, pp. 210–215. Amsterdam August 27-29, 2007
Labrecque, M., Steffan, J.G.: Fast critical sections via thread scheduling for FPGA-based multithreaded processors. In: Intern. Conf. on FPL, pp. 18–25. Prague, Czech Republic 31 August –2 September 2009
Leiserson, C., Saxe, J.: Retiming Synchronous Circuitry. Algorithmica 6(1), 5–35 (1991)
Weaver, N., Wawrzynek, J.: The effects of datapath placement and C- slow retiming on three computational benchmarks. In: Proc. FCCM 2002, pp. 303–304. Napa, CA, USA April 24, 2002
Strauch, T.: Timing driven C-slow retiming on RTL for multicores on FPGAs. In: ParaFPGA 2013. Munich, Germany September 10-13, 2013. www.edaptix.com/ParCo2013_Strauch_CSR_RTL.pdf
Su, M., Zhou, L., Shi, C.: Maximizing the throughput-area efficiency of fully-parallel low-density parity-check decoding with c-slow retiming and asynchronous deep pipelining. In: ICCD 2007, pp. 636–643. Lake Tahoe, CA, USA October 7-10, 2007
Afram, M., Khan, A., Sarfaraz, M.: C-slow technique vs. multiprocessor in designing low area customized set processor for embedded applications. In: Intern. Journal of Computer Applications 6(7) (2001)
Cadenas, J., Sherratt, S., Huerta, P., Kao, W.-C., Megson, G.M.: C-slow retimed parallel histogram archi-tectures for consumer imaging devices. Transactions on Consumer Electronics 59(2), pp. 291–295
Opencores, Stockholm, Sweden, 2007. www.opencores.org/projects
The RISCV Instruction Set Architecture (riscv.org)
Atmel: AT91SAM ARM based Flashed MCU. http://www.atmel.com/Images/doc11057.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Strauch, T. (2015). The Effects of System Hyper Pipelining on Three Computational Benchmarks Using FPGAs. In: Sano, K., Soudris, D., Hübner, M., Diniz, P. (eds) Applied Reconfigurable Computing. ARC 2015. Lecture Notes in Computer Science(), vol 9040. Springer, Cham. https://doi.org/10.1007/978-3-319-16214-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-16214-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16213-3
Online ISBN: 978-3-319-16214-0
eBook Packages: Computer ScienceComputer Science (R0)