Skip to main content

Advertisement

Log in

Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The exponential growth in computation demand drives chip vendors to heterogeneous architectures combining Instruction-Level Processors (ILPs) and custom HW Accelerators (HWACCs) in an attempt to provide the needed processing capabilities while meeting power/energy requirements. ILPs, on one hand, are highly flexible, but power inefficient. Custom HWACCs, on the other hand, are inflexible (focusing on dedicated kernels), but highly power efficient. New processing architectures are needed that combine the power efficiency of HWACCs while still retaining sufficient flexibility to realize applications across targeted markets. This article introduces Function-Level Processors (FLPs) to fill the gap between ILPs and dedicated HWACCs. FLPs are comprised of configurable Function Blocks (FBs) implementing selected functions which are then interconnected via programmable point-to-point connections constructing an extensible/configurable macro data-path. An FLP raises programming abstraction to a Function-Set Architecture (FSA) controlling FBs allocation, configuration and scheduling. We demonstrate FLP benefits with an industry example of the Pipeline-Vision Processor (PVP). We highlight the gained flexibility by mapping 10 embedded vision applications entirely to the FLP-PVP offering up to 22.4 GOPs/s with an average power of 120 mW. The results also demonstrate that our FLP-PVP solution consumes 1/18th - 1/14th of the power of an ILP and 1/5th of the power of a hybrid ILP+HWACCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21

Similar content being viewed by others

Notes

  1. The second author of this paper is the chief architect of the PVP.

  2. High-level synthesize tools can significantly simplify process of programming FPGAs. Nonetheless, the flexibility of bit level programmability at the architecture level still remains.

  3. Note that FLP and PVP do not aim for full flexibility to execute any code, but they aim for domain flexibility only (to take advantage of specialization).

References

  1. (2010). Sematech, International technology roadmap for semiconductors (itrs). Available: http://www.itrs.net/Links/2011ITRS/Home2011.htm [Accessed: 16-Jul-2012], vol. 9, no. 2, pp. 53 –56.

  2. Hameed, R., et al. (2010). Understanding sources of inefficiency in general-purpose chips. In International symposium on Computer architecture (ISCA) (pp. 37–47).

  3. Keckler, S., et al. (2011). Gpus and the future of parallel computing. IEEE Micro, 31(5), 7–17.

    Article  Google Scholar 

  4. Agarwal, G. (2012). Get smart with TI’s embedded analytics technology, Texas Instruments (TI) white paper. Available: www.ti.com/dsp-c6-analytics-b-mc.

  5. Krishna, A., et al. (2012). Hardware acceleration in the ibm poweren processor: architecture and performance. In Parallel architectures and compilation techniques (PACT) (pp. 389–400).

  6. Melpignano, D., et al. (2012). Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In Design Automation Conference (DAC) (pp. 1137–1142).

  7. Rui, H., et al. (2011). Efficient data streaming with on-chip accelerators: Opportunities and challenges. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 312–320).

  8. Lyons, M., et al. (2010). The accelerator store framework for high-performance, low-power accelerator-based systems. Computer Architecture Letters, 9(2), 53–56.

    Article  MathSciNet  Google Scholar 

  9. A. D. Inc.(ADI) (2010). Devonshire Blackfin BF60x SOC and Pipelined Video / Vision Processor Architecture. In General Technical Conference.

  10. Liu, S., Pittman, R.N., & Forin, A. (2010). Minimizing partial reconfiguration overhead with fully streaming dma engines and intelligent icap controller. In FPGA (p. 292).

  11. Kuon, I., & Rose, J. (2007). Measuring the gap between fpgas and asics. IEEE Trans. on Comput-Aided Des. of Integr. Circuits and Sys., 26(2), 203–215.

    Article  Google Scholar 

  12. Park, H., Park, Y., & Mahlke, S. (2009). Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In International Symposium on Microarchitecture (MICRO) (pp. 370–380).

  13. Stillwell, P., et al. (2009). Hippai: High performance portable accelerator interface for socs. In International Conference on High Performance Computing (HiPC) (pp. 109–118).

  14. Kevin, F., et al. (2009). Bridging the computation gap between programmable processors and hardwired accelerators. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 313–322).

  15. Clark, N., Hormati, A., & Mahlke, S. (2008). Veal: Virtualized execution accelerator for loops. In International Symposium on Computer Architecture (ISCA) (pp. 389–400).

  16. Javaid, H., Witono, D., & Parameswaran, S. (2013). Multi-mode pipelined mpsocs for streaming applications. In Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 231–236).

  17. Ramirez, A., et al. (2010). The sarc architecture. IEEE Micro, 30(5), 16–29.

    Article  Google Scholar 

  18. Gupta, S., et al. (2011). Bundled execution of recurring traces for energy-efficient general purpose processing. In International Symposium on Microarchitecture (MICRO (pp. 12–23).

  19. Cong, J., et al. (2012). Architecture support for accelerator-rich cmps. In Design Automation Conference (DAC) (pp. 843–849).

  20. Cong, J., Ghodrat, M.A., Gill, M., Grigorian, B., & Reinman, G. (2012). Charm: A composable heterogeneous accelerator-rich microprocessor. In Proceedings of the International Symposium on Low Power Electronics and Design, ser ISLPED ’12 (pp. 379–384).

  21. Cong, J., Ghodrat, M., Gill, M., Grigorian, B., Huang, H., & Reinman, G. (2013). Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED (pp. 305–310).

  22. Hartenstein, R. (2001). A decade of reconfigurable computing: a visionary retrospective. In Design Automation and Test in Europe (DATE) (pp. 642–649).

  23. Nieto, A., et al. (2012). Simd/mimd dynamically-reconfigurable architecture for high-performance embedded vision systems. In International Conference on Application-Specific Systems Architectures and Processors (ASAP) (pp. 94–101).

  24. Pedram, A., Gerstlauer, A., & Geijn, R. (2011). A high-performance, low-power linear algebra core. In International Conference on Application-Specific Systems Architectures and Processors (ASAP) (pp. 35–42).

  25. Mei, B., Vernalde, S., Verkest, D., Man, H.D., & Adres, R. (2003). An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix, Springer Field Programmable Logic and Application (Vol. 2778, pp. 61–70).

  26. T. Inc., & Diamond Standard Processor Core Family Architecture, Reference Guide Part Number 82-100113-01. http://www.tensilica.com/pdf/Diamond_WP.pdf.

  27. Venkatesh, G., et al. (2011). Qscores: trading dark silicon for scalable energy efficiency with quasi-specific cores. In International Symposium on Microarchitecture (MICRO), ser. MICRO-44 ’11 (pp. 163–174).

  28. Govindaraju, V., Ho, C.-H., & Sankaralingam, K. (2011). Dynamically specialized datapaths for energy efficient computing. In International Symposium on High Performance Computer Architecture (HPCA) (pp. 503–514).

  29. Tang, L., Ambrose, J., & Parameswaran, S. (2013). Reconfigurable pipelined coprocessor for multi-mode communication transmission. In Design Automation Conference (DAC) (pp. 1–8).

  30. Tabkhi, H., Bushey, R., & Schirner, G. (2014). Function-level processor (flp): Raising efficiency by operating at function granularity for market-oriented mpsoc. In 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 121–130).

  31. Damavandpeyma, M., Stuijk, S., Basten, T., Geilen, M., & Corporaal, H. (2012). Modeling static-order schedules in synchronous dataflow graphs. In Design Automation Test in Europe Conference Exhibition (DATE), (pp. 775–780).

  32. Thies, W., Chandrasekhar, V., & Amarasinghe, S. (2007). A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO, (Vol. 40 pp. 356–369).

  33. Kangralkar, R. Enhancing a system-level design flow by rtos integration, M.Sc. Thesis, Northeastern University, 2012. Available: http://www.northeastern.edu/esl/pubs/NEUThesis_Kangralkar_2012..

  34. Bushey, R., Tabkhi, H., & Schirner, G. (2013). Flexible function-level acceleration of embedded vision applications using the pipelined vision. In Proceedings of the Asilomar Conference on Signals, Systems and Computers (AsilomarSSC).

  35. A. D. Inc.(ADI) (2012). ADSP-BF60x Blackfin Processor Hardware Reference Manual. Reference Guide, Part Number 82-100113-01.

  36. Malladi, K., et al. (2012). Towards energy-proportional data center memory with mobile dram. In International Symposium on Computer Architecture (ISCA) (pp. 37–48).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamed Tabkhi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tabkhi, H., Bushey, R. & Schirner, G. Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications. J Sign Process Syst 85, 287–306 (2016). https://doi.org/10.1007/s11265-015-1058-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-1058-5

Keywords

Navigation