Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications

Tabkhi, Hamed; Bushey, Robert; Schirner, Gunar

doi:10.1007/s11265-015-1058-5

Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications

Published: 06 November 2015

Volume 85, pages 287–306, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Hamed Tabkhi¹,
Robert Bushey² &
Gunar Schirner¹

762 Accesses
8 Citations
Explore all metrics

Abstract

The exponential growth in computation demand drives chip vendors to heterogeneous architectures combining Instruction-Level Processors (ILPs) and custom HW Accelerators (HWACCs) in an attempt to provide the needed processing capabilities while meeting power/energy requirements. ILPs, on one hand, are highly flexible, but power inefficient. Custom HWACCs, on the other hand, are inflexible (focusing on dedicated kernels), but highly power efficient. New processing architectures are needed that combine the power efficiency of HWACCs while still retaining sufficient flexibility to realize applications across targeted markets. This article introduces Function-Level Processors (FLPs) to fill the gap between ILPs and dedicated HWACCs. FLPs are comprised of configurable Function Blocks (FBs) implementing selected functions which are then interconnected via programmable point-to-point connections constructing an extensible/configurable macro data-path. An FLP raises programming abstraction to a Function-Set Architecture (FSA) controlling FBs allocation, configuration and scheduling. We demonstrate FLP benefits with an industry example of the Pipeline-Vision Processor (PVP). We highlight the gained flexibility by mapping 10 embedded vision applications entirely to the FLP-PVP offering up to 22.4 GOPs/s with an average power of 120 mW. The results also demonstrate that our FLP-PVP solution consumes 1/18th - 1/14th of the power of an ILP and 1/5th of the power of a hybrid ILP+HWACCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous Computing Utilizing FPGAs

Article 31 May 2018

Marc Reichenbach, Philipp Holzinger, … Dietmar Fey

A Design Methodology for the Next Generation Real-Time Vision Processors

The Good, the Bad and the Ugly: Practices and Perspectives on Hardware Acceleration for Embedded Image Processing

Article 29 July 2023

Joshua Fryer & Paulo Garcia

Notes

The second author of this paper is the chief architect of the PVP.
High-level synthesize tools can significantly simplify process of programming FPGAs. Nonetheless, the flexibility of bit level programmability at the architecture level still remains.
Note that FLP and PVP do not aim for full flexibility to execute any code, but they aim for domain flexibility only (to take advantage of specialization).

References

(2010). Sematech, International technology roadmap for semiconductors (itrs). Available: http://www.itrs.net/Links/2011ITRS/Home2011.htm [Accessed: 16-Jul-2012], vol. 9, no. 2, pp. 53 –56.
Hameed, R., et al. (2010). Understanding sources of inefficiency in general-purpose chips. In International symposium on Computer architecture (ISCA) (pp. 37–47).
Keckler, S., et al. (2011). Gpus and the future of parallel computing. IEEE Micro, 31(5), 7–17.
Article Google Scholar
Agarwal, G. (2012). Get smart with TI’s embedded analytics technology, Texas Instruments (TI) white paper. Available: www.ti.com/dsp-c6-analytics-b-mc.
Krishna, A., et al. (2012). Hardware acceleration in the ibm poweren processor: architecture and performance. In Parallel architectures and compilation techniques (PACT) (pp. 389–400).
Melpignano, D., et al. (2012). Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In Design Automation Conference (DAC) (pp. 1137–1142).
Rui, H., et al. (2011). Efficient data streaming with on-chip accelerators: Opportunities and challenges. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 312–320).
Lyons, M., et al. (2010). The accelerator store framework for high-performance, low-power accelerator-based systems. Computer Architecture Letters, 9(2), 53–56.
Article MathSciNet Google Scholar
A. D. Inc.(ADI) (2010). Devonshire Blackfin BF60x SOC and Pipelined Video / Vision Processor Architecture. In General Technical Conference.
Liu, S., Pittman, R.N., & Forin, A. (2010). Minimizing partial reconfiguration overhead with fully streaming dma engines and intelligent icap controller. In FPGA (p. 292).
Kuon, I., & Rose, J. (2007). Measuring the gap between fpgas and asics. IEEE Trans. on Comput-Aided Des. of Integr. Circuits and Sys., 26(2), 203–215.
Article Google Scholar
Park, H., Park, Y., & Mahlke, S. (2009). Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In International Symposium on Microarchitecture (MICRO) (pp. 370–380).
Stillwell, P., et al. (2009). Hippai: High performance portable accelerator interface for socs. In International Conference on High Performance Computing (HiPC) (pp. 109–118).
Kevin, F., et al. (2009). Bridging the computation gap between programmable processors and hardwired accelerators. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 313–322).
Clark, N., Hormati, A., & Mahlke, S. (2008). Veal: Virtualized execution accelerator for loops. In International Symposium on Computer Architecture (ISCA) (pp. 389–400).
Javaid, H., Witono, D., & Parameswaran, S. (2013). Multi-mode pipelined mpsocs for streaming applications. In Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 231–236).
Ramirez, A., et al. (2010). The sarc architecture. IEEE Micro, 30(5), 16–29.
Article Google Scholar
Gupta, S., et al. (2011). Bundled execution of recurring traces for energy-efficient general purpose processing. In International Symposium on Microarchitecture (MICRO (pp. 12–23).
Cong, J., et al. (2012). Architecture support for accelerator-rich cmps. In Design Automation Conference (DAC) (pp. 843–849).
Cong, J., Ghodrat, M.A., Gill, M., Grigorian, B., & Reinman, G. (2012). Charm: A composable heterogeneous accelerator-rich microprocessor. In Proceedings of the International Symposium on Low Power Electronics and Design, ser ISLPED ’12 (pp. 379–384).
Cong, J., Ghodrat, M., Gill, M., Grigorian, B., Huang, H., & Reinman, G. (2013). Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED (pp. 305–310).
Hartenstein, R. (2001). A decade of reconfigurable computing: a visionary retrospective. In Design Automation and Test in Europe (DATE) (pp. 642–649).
Nieto, A., et al. (2012). Simd/mimd dynamically-reconfigurable architecture for high-performance embedded vision systems. In International Conference on Application-Specific Systems Architectures and Processors (ASAP) (pp. 94–101).
Pedram, A., Gerstlauer, A., & Geijn, R. (2011). A high-performance, low-power linear algebra core. In International Conference on Application-Specific Systems Architectures and Processors (ASAP) (pp. 35–42).
Mei, B., Vernalde, S., Verkest, D., Man, H.D., & Adres, R. (2003). An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix, Springer Field Programmable Logic and Application (Vol. 2778, pp. 61–70).
T. Inc., & Diamond Standard Processor Core Family Architecture, Reference Guide Part Number 82-100113-01. http://www.tensilica.com/pdf/Diamond_WP.pdf.
Venkatesh, G., et al. (2011). Qscores: trading dark silicon for scalable energy efficiency with quasi-specific cores. In International Symposium on Microarchitecture (MICRO), ser. MICRO-44 ’11 (pp. 163–174).
Govindaraju, V., Ho, C.-H., & Sankaralingam, K. (2011). Dynamically specialized datapaths for energy efficient computing. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 503–514).
Tang, L., Ambrose, J., & Parameswaran, S. (2013). Reconfigurable pipelined coprocessor for multi-mode communication transmission. In Design Automation Conference (DAC) (pp. 1–8).
Tabkhi, H., Bushey, R., & Schirner, G. (2014). Function-level processor (flp): Raising efficiency by operating at function granularity for market-oriented mpsoc. In 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 121–130).
Damavandpeyma, M., Stuijk, S., Basten, T., Geilen, M., & Corporaal, H. (2012). Modeling static-order schedules in synchronous dataflow graphs. In Design Automation Test in Europe Conference Exhibition (DATE), (pp. 775–780).
Thies, W., Chandrasekhar, V., & Amarasinghe, S. (2007). A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO, (Vol. 40 pp. 356–369).
Kangralkar, R. Enhancing a system-level design flow by rtos integration, M.Sc. Thesis, Northeastern University, 2012. Available: http://www.northeastern.edu/esl/pubs/NEUThesis_Kangralkar_2012..
Bushey, R., Tabkhi, H., & Schirner, G. (2013). Flexible function-level acceleration of embedded vision applications using the pipelined vision. In Proceedings of the Asilomar Conference on Signals, Systems and Computers (AsilomarSSC).
A. D. Inc.(ADI) (2012). ADSP-BF60x Blackfin Processor Hardware Reference Manual. Reference Guide, Part Number 82-100113-01.
Malladi, K., et al. (2012). Towards energy-proportional data center memory with mobile dram. In International Symposium on Computer Architecture (ISCA) (pp. 37–48).

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, 02115, USA
Hamed Tabkhi & Gunar Schirner
Embedded Systems Products and Technology Analog Devices Inc. (ADI), Norwood, MA, 02062, USA
Robert Bushey

Authors

Hamed Tabkhi
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bushey
View author publications
You can also search for this author in PubMed Google Scholar
Gunar Schirner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamed Tabkhi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabkhi, H., Bushey, R. & Schirner, G. Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications. J Sign Process Syst 85, 287–306 (2016). https://doi.org/10.1007/s11265-015-1058-5

Download citation

Received: 18 November 2014
Revised: 17 August 2015
Accepted: 05 October 2015
Published: 06 November 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11265-015-1058-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications

Abstract

Access this article

Similar content being viewed by others

Heterogeneous Computing Utilizing FPGAs

A Design Methodology for the Next Generation Real-Time Vision Processors

The Good, the Bad and the Ugly: Practices and Perspectives on Hardware Acceleration for Embedded Image Processing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications

Abstract

Access this article

Similar content being viewed by others

Heterogeneous Computing Utilizing FPGAs

A Design Methodology for the Next Generation Real-Time Vision Processors

The Good, the Bad and the Ugly: Practices and Perspectives on Hardware Acceleration for Embedded Image Processing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation