Skip to main content
Log in

Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Processing-in-Memory (PIM) or Near-Data Accelerator (NDA) has been recently revisited to mitigate the issues of memory and power wall, mainly supported by the maturity of 3D-staking manufacturing technology, and the increasing demand for bandwidth and parallel data access in emerging processing-hungry applications. However, as these designs are naturally decoupled from main processors, at least three open issues must be tackled to allow the adoption of PIM: how to offload instructions from the host to NDAs, since many can be placed along memory; how to keep cache coherence between host and NDAs, and how to deal with the internal communication between different NDA units considering that NDAs can communicate to each other to better exploit their adoptions. In this work, we present an efficient design to solve these challenges. Based on the hybrid Host-Accelerator code, to provide fine-grain control, our design allows transparent offloading of NDA instructions directly from a host processor. Moreover, our design proposes a data coherence protocol, which includes an inclusion-policy agnostic cache coherence mechanism to share data between the host processor and the NDA units, transparently, and a protocol to allow communication between different NDA units. The proposed mechanism allows full exploitation of the experimented state-of-the-art design, achieving a speedup of up to 14.6× compared to a AVX architecture on PolyBench Suite, using, on average, 82% of the total time for processing and only 18% for the cache coherence and communication protocols.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aga, S., Jeloka, S., Subramaniyan, A., Narayanasamy, S., Blaauw, D., Das, R.: Compute caches. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 481–492 (2017)

  2. Ahmed, H., Santos, P.C., de Lima, J.P.C., de Moura, R.F., Alves, M.A., Beck, A., Carro, L.: A compiler for automatic selection of suitable processing-in-memory instructions. In: Design, Automation & Test in Europe Conference & Exhibition (DATE) (2019)

  3. Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: International Symposium on Computer Architecture (2015)

  4. Ahn, J., Yoo, S., Mutlu, O., Choi, K.: Pim-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: International Symposium on Computer Architecture (ISCA), pp. 336–348. IEEE (2015)

  5. Akin, B., Franchetti, F., Hoe, J.C.: Data reorganization in memory using 3d-stacked dram. In: ACM SIGARCH Computer Architecture News, vol. 43, pp. 131–143. ACM (2015)

  6. Alves, M.A., Diener, M., Santos, P.C., Carro, L.: Large vector extensions inside the hmc. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1249–1254. IEEE (2016)

  7. Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., et al.: The gem5 simulator. ACM SIGARCH Computer Architecture News 39, (2011)

  8. Boroumand, A., Ghose, S., Lucia, B., Hsieh, K., Malladi, K., Zheng, H., Mutlu, O.: LazyPIM: an efficient cache coherence mechanism for processing-in-memory. IEEE Comput. Architect. Lett. 16(1), 46–50 (2016)

    Article  Google Scholar 

  9. Drumond, M., Daglis, A., Mirzadeh, N., Ustiugov, D., Picorel, J., Falsafi, B., Grot, B., Pnevmatikatos, D.: The mondrian data engine. In: International Symposium on Computer Architecture. ACM (2017)

  10. Eckert, Y., Jayasena, N., Loh, G.H.: Thermal feasibility of die-stacked processing in memory. In: 2nd Workshop on Near-Data Processing (WoNDP) (2014)

  11. Gao, M., Kozyrakis, C.: HRL: efficient and flexible reconfigurable logic for near-data processing. In: International Symposium High Performance Computer Architecture (HPCA) (2016)

  12. Hsieh, K., Ebrahimi, E., Kim, G., Chatterjee, N., O’Connor, M., Vijaykumar, N., Mutlu, O., Keckler, S.W.: Transparent offloading and mapping (tom): Enabling programmer-transparent near-data processing in gpu systems. ACM SIGARCH Comput. Architect. News 44(3), 204–216 (2016)

    Article  Google Scholar 

  13. Hsieh, K., Khan, S., Vijaykumar, N., et al.: Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In: International Conference on Computer Design (ICCD) (2016)

  14. Hybrid Memory Cube Consortium: Hybrid memory cube specification rev. 2.0 (2013). https://www.micron.com/-/media/client/global/documents/products/data-sheet/hmc/gen2/hmc_gen2.pdf

  15. Lee, J.H., Sim, J., Kim, H.: Bssync: Processing near memory for machine learning workloads with bounded staleness consistency models. In: International Conference on Parallel Architecture and Compilation (PACT), pp. 241–252. IEEE (2015)

  16. de Lima, J.P.C., Santos, P.C., Alves, M.A., Beck, A., Carro, L.: Design space exploration for PIM architectures in 3d-stacked memories. In: Proceedings of the 15th ACM International Conference on Computing Frontiers, pp. 113–120. ACM (2018)

  17. Nai, L., Hadidi, R., Sim, J., Kim, H., Kumar, P., Kim, H.: Graphpim: Enabling instruction-level pim offloading in graph computing frameworks. In: International Symposium High Performance Computer Architecture (HPCA), pp. 457–468. IEEE (2017)

  18. Nair, R., Antao, S.F., Bertolli, C., Bose, P., et al.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Develop. 59, 2–3 (2015)

    Article  Google Scholar 

  19. Oliveira, G.F., Santos, P.C., Alves, M.A., Carro, L.: Nim: An hmc-based machine for neuron computation. In: International Symposium on Applied Reconfigurable Computing (2017)

  20. Papamarcos, M.S., Patel, J.H.: A low-overhead coherence solution for multiprocessors with private cache memories. SIGARCH Comput. Archit. News 12(3), 348–354 (1984). https://doi.org/10.1145/773453.808204

    Article  Google Scholar 

  21. Pawlowski, J.T.: Hybrid memory cube (HMC). In: Hot Chips 23 Symposium (HCS). IEEE (2011)

  22. Pouchet, L.N.: Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench (2012)

  23. Pugsley, S., Jestes, J., Balasubramonian, R., et al.: Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34(4), 44–52 (2014)

    Article  Google Scholar 

  24. Santos, P.C., de Lima, J.P.C., de Moura, R.F., Ahmed, H., Alves, M.A., Beck, A., Carro, L.: Exploring IoT platform with technologically agnostic processing-in-memory framework. In: Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications, pp. 1–6. ACM (2018)

  25. Santos, P.C., Oliveira, G.F., Tomé, D.G., Alves, M.A., Almeida, E.C., Carro, L.: Operand size reconfiguration for big data processing in memory. In: Design, Automation & Test in Europe Conference & Exhibition (DATE) (2017)

  26. Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Exploring the processing-in-memory design space. J. Syst. Architect. 75, 59–67 (2017)

    Article  Google Scholar 

  27. Singh, G., Chelini, L., Corda, S., Awan, A.J., Stuijk, S., Jordans, R., Corporaal, H., Boonstra, A.J.: A review of near-memory computing architectures: Opportunities and challenges. In: Euromicro Conference on Digital System Design (DSD) (2018)

  28. Standard JEDEC: High Bandwidth Memory (HBM) DRAM. JESD235 (2013)

  29. Zhang, D., Jayasena, N., Lyashevsky, A., et al.: TOP-PIM: throughput-oriented programmable processing in memory. In: International Symposium on High-performance Parallel and Distributed Computing (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo C. Santos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This study was financed in part by the CAPES (Finance Code 001), CNPq, FAPERGS, and Serrapilheira (Serra-1709-16621).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santos, P.C., de Lima, J.P.C., de Moura, R.F. et al. Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions. Int J Parallel Prog 49, 237–252 (2021). https://doi.org/10.1007/s10766-020-00674-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-020-00674-y

Keywords

Navigation