Exploiting Reconfigurable Vector Processing for Energy-Efficient Computation in 3D-Stacked Memories

de Lima, João Paulo C.; Santos, Paulo C.; de Moura, Rafael F.; Alves, Marco A. Z.; Beck, Antonio C. S.; Carro, Luigi

doi:10.1007/978-3-030-17227-5_19

João Paulo C. de Lima¹⁹,
Paulo C. Santos¹⁹,
Rafael F. de Moura¹⁹,
Marco A. Z. Alves²⁰,
Antonio C. S. Beck¹⁹ &
…
Luigi Carro¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11444))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

Abstract

Although Processing-in-Memory (PIM) architectures have helped to reduce the effect of the memory wall, the logic placed inside 3D-memories still faces the large disparity between DRAM and CMOS logic operations. Thereby, for a broad range of emerging data-intensive applications, the Functional Units (FUs) are usually underutilized, especially when the application presents poor temporal-locality. As applications demand irregular processing requirements on the different parts of their execution, this behavior can be used to reconfigure energy-reduction techniques, either by scaling frequency or by power-gating functional units. In this paper, we present the application-dependable characteristics that enable dynamic usage of energy-reduction techniques without performance degradation for highly constrained PIM designs. The experimental results show that the exploration of a reconfiguration mechanism can improve PIM system energy efficiency by 5$\times $ and also can effectively benefit both memory-intensive and compute-intensive applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DVFS Space Exploration in Power Constrained Processing-in-Memory Systems

A Modern Primer on Processing in Memory

The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption

References

de Lima, J.P.C., Santos, P.C., Alves, M.A., Beck, A., Carro, L.: Design space exploration for PIM architectures in 3D-stacked memories. In: International Conference on Computing Frontiers, pp. 113–120. ACM (2018)
Google Scholar
Hu, X., Stow, D., Xie, Y.: Die stacking is happening. IEEE Micro 38(1), 22–28 (2018)
Article Google Scholar
Awan, A.J., Brorsson, M., Vlassov, V., Ayguade, E.: Performance characterization of in-memory data analytics on a modern cloud server. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 1–8. IEEE (2015)
Google Scholar
Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification Rev. 2.0 (2013). http://www.hybridmemorycube.org/
Lee, D.U., et al.: 25.2 A 1.2 V 8 GB 8-channel 128 GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29 nm process and TSV. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 432–433, February 2014
Google Scholar
Zhu, Q., et al.: A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: International 3D Systems Integration Conference (2013)
Google Scholar
Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49(4), 269–284 (2014)
Google Scholar
Mittal, S.: A survey of techniques for improving energy efficiency in embedded computing systems. arXiv preprint arXiv:1401.0765 (2014)
Nair, R., et al.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Dev. 59(2/3), 17-1 (2015)
Article Google Scholar
Morad, A., Yavits, L., Kvatinsky, S., Ginosar, R.: Resistive GP-SIMD processing-in-memory. ACM Trans. Archit. Code Optim. (TACO) 12(4), 57 (2016)
Google Scholar
Santos, P.C., Oliveira, G.F., Tome, D.G., Alves, M.A.Z., Almeida, E.C., Carro, L.: Operand size reconfiguration for big data processing in memory. In: 2017 Design, Automation Test in Europe Conference Exhibition (DATE), March 2017
Google Scholar
Keramidas, G., Petoumenos, P., Kaxiras, S.: Cache replacement based on reuse-distance prediction. In: 25th International Conference on Computer Design, ICCD 2007, pp. 245–250. IEEE (2007)
Google Scholar
Ding, W., Guttman, D., Kandemir, M.: Compiler support for optimizing memory bank-level parallelism. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 571–582. IEEE Computer Society (2014)
Google Scholar
Sura, Z., et al.: Data access optimization in a processing-in-memory system. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, p. 6. ACM (2015)
Google Scholar
Ahmed, H., et al.: A compiler for automatic selection of suitable processing-in-memory instructions. In: Design, Automation and Test in Europe Conference and Exhibition (DATE) (2019)
Google Scholar
Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 1–7 (2011)
Article Google Scholar
Santos, P.C., de Lima, J.P.C., Moura, R.F., Alves, M.A., Beck, A., Carro, L.: Exploring IoT platform with technologically agnostic processing-in-memory framework. In: Proceedings of the Intelligent Embedded Systems Architectures and Applications Workshop. IEEE (2018)
Google Scholar
Hsieh, K., et al.: Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems. ACM SIGARCH Comput. Archit. News 44(3), 204–216 (2016)
Article Google Scholar
Farmahini-Farahani, A., Ahn, J., Compton, K., Kim, N.: Drama: an architecture for accelerated processing near memory. Comput. Archit. Lett. 14(99), 26–29 (2014)
Google Scholar
Gao, M., Kozyrakis, C.: HRL: efficient and flexible reconfigurable logic for near-data processing. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 126–137. IEEE (2016)
Google Scholar
Drumond, M., et al.: The mondrian data engine. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 639–651. IEEE (2017)
Article Google Scholar
Saito, Y., et al.: Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained power gating technique. In: International Conference on Engineering and Computer Education (2008)
Google Scholar
Yamamoto, T., Hironaka, K., Hayakawa, Y., Kimura, M., Amano, H., Usami, K.: Dynamic ${\rm V}_{\rm DD}$ switching technique and mapping optimization in dynamically reconfigurable processor for efficient energy reduction. In: Koch, A., Krishnamurthy, R., McAllister, J., Woods, R., El-Ghazawi, T. (eds.) ARC 2011. LNCS, vol. 6578, pp. 230–241. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19475-7_24
Chapter Google Scholar
Nowatzki, T., Gangadhar, V., Ardalani, N., Sankaralingam, K.: Stream-dataflow acceleration. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 416–429. IEEE (2017)
Google Scholar
Stanic, M., et al.: An integrated vector-scalar design on an in-order ARM core. ACM Trans. Archit. Code Optim. (TACO) 14(2), 17 (2017)
Google Scholar

Download references

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and by the Serrapilheira Institute (grant number Serra-1709-16621).

Author information

Authors and Affiliations

Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
João Paulo C. de Lima, Paulo C. Santos, Rafael F. de Moura, Antonio C. S. Beck & Luigi Carro
Department of Informatics, Federal University of Paraná, Curitiba, Brazil
Marco A. Z. Alves

Authors

João Paulo C. de Lima
View author publications
You can also search for this author in PubMed Google Scholar
Paulo C. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Rafael F. de Moura
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Z. Alves
View author publications
You can also search for this author in PubMed Google Scholar
Antonio C. S. Beck
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Carro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Paulo C. de Lima .

Editor information

Editors and Affiliations

Technical University of Darmstadt, Darmstadt, Germany
Christian Hochberger
Brigham Young University, Provo, UT, USA
Brent Nelson
Technical University of Darmstadt, Darmstadt, Germany
Andreas Koch
Queen’s University Belfast, Belfast, UK
Roger Woods
INESC-ID, Lisbon, Portugal
Pedro Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Lima, J.P.C., Santos, P.C., de Moura, R.F., Alves, M.A.Z., Beck, A.C.S., Carro, L. (2019). Exploiting Reconfigurable Vector Processing for Energy-Efficient Computation in 3D-Stacked Memories. In: Hochberger, C., Nelson, B., Koch, A., Woods, R., Diniz, P. (eds) Applied Reconfigurable Computing. ARC 2019. Lecture Notes in Computer Science(), vol 11444. Springer, Cham. https://doi.org/10.1007/978-3-030-17227-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-17227-5_19
Published: 29 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17226-8
Online ISBN: 978-3-030-17227-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics