Abstract
Traditional computing hardware is working to meet the extensive computational load presented by the rapidly growing Machine Learning (ML) and Artificial Intelligence algorithms such as Deep Neural Networks and Big Data. In order to get hardware solutions to meet the low-latency and high-throughput computational needs of these algorithms, Non-Von Neumann computing architectures such as In-memory Computing (IMC) have been extensively researched and experimented with over the last five years. This study analyses and reviews works designed to accelerate Machine Learning task. We investigate different architectural aspects and directions and provide our comparative evaluations. We further discuss IMC research’s challenges and limitations and present possible directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hashiyana, V., Suresh, N., Sverdlik, W.: Big data: We’re almost at infinity. In: 2017 IST-Africa Week Conference (IST-Africa), pp. 1–7. IEEE (2017)
Salkuti, S.R.: A survey of big data and machine learning. Int. J. Electr. Comput. Eng. (2088–8708) 10(1) (2020)
Zhang, Y., Huang, T., Bompard, E.F.: Big data analytics in smart grids: a review. Energy Inform. 1(1), 1–24 (2018). https://doi.org/10.1186/s42162-018-0007-5
Khan, A.I., Al-Habsi, S.: Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451 (2020)
Lim, B., Zohren, S.: Time-series forecasting with deep learning: a survey. Philosophical Trans. Roy. Soc. A 379(2194), 20200209 (2021)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Khan, A.A., Laghari, A.A., Awan, S.A.: Machine learning in computer vision: a review. EAI Endorsed Trans. Scalable Inf. Syst. 8(32), e4 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)
Wang, W., Yang, Y., Wang, X., Wang, W., Li, J.: Development of convolutional neural network and its application in image classification: a survey. Optical Eng. 58(4), 040901 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Bradski, G.: The opencv library. Dr. Dobb’s J. Softw. Tools Prof. Programmer 25(11), 120–123 (2000)
Longa, A., Santin, G., Pellegrini, G.: Pyg, torch_geometric (2022). http://github.com/PyGithub/PyGithub. Accessed 24 Sept 2022
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 32(1), 4–24 (2020)
Zhao, R., Luk, W., Niu, X., Shi, H., Wang, H.: Hardware acceleration for machine learning. In: 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 645–650. IEEE (2017)
Faggin, F., Mead, C.: Vlsi implementation of neural networks (1990)
Jesan, J.P., Lauro, D.M.: Human brain and neural network behavior: a comparison (2003)
Mijwel, M.M.: Artificial neural networks advantages and disadvantages. Retrieved from LinkedIn (2018) http://www.linkedin.com/pulse/artificial-neuralnetWork
Reuben, J.: Rediscovering majority logic in the post-cmos era: a perspective from in-memory computing. J. Low Power Electron. Appl. 10(3), 28 (2020)
Lynham, J.: How have catch shares been allocated? Marine Policy 44, 42–48 (2014)
Hoschek, W., Jaen-Martinez, J., Samar, A., Stockinger, H., Stockinger, K.: Data management in an international data grid project. In: Buyya, R., Baker, M. (eds.) GRID 2000. LNCS, vol. 1971, pp. 77–90. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44444-0_8
Kabakus, A.T., Kara, R.: A performance evaluation of in-memory databases. J. King Saud Univ.-Comput. Inf. Sci. 29(4), 520–525 (2017)
Rashed, M.R.H., Thijssen, S., Jha, S.K., Yao, F., Ewetz, R.: Stream: towards read-based in-memory computing for streaming based data processing. In: 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 690–695. IEEE (2022)
Peng, X., Huang, S., Jiang, H., Lu, A., Yu, S.: Dnn+ neurosim v2. 0: an end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst. 40(11), 2306–2319 (2020)
Angizi, S., He, Z., Fan, D.: Dima: a depthwise cnn in-memory accelerator. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2018)
Ríos, C., et al.: In-memory computing on a photonic platform. Sci. Adv. 5(2), eaau5759 (2019)
Zanotti, T., Puglisi, F.M., Pavan, P.: Reconfigurable smart in-memory computing platform supporting logic and binarized neural networks for low-power edge devices. IEEE J. Emerging Sel. Top. Circuits Syst. 10(4), 478–487 (2020)
Agrawal, A., Jaiswal, A., Lee, C., Roy, K.: X-sram: enabling in-memory boolean computations in cmos static random access memories. IEEE Trans. Circuits Syst. I: Regular Papers 65(12), 4219–4232 (2018)
Verma, N., et al.: In-memory computing: advances and prospects. IEEE Solid-State Circuits Mag. 11(3), 43–55 (2019)
Wang, Y.: Design considerations for emerging memory and in-memory computing. In: VLSI 2020 Symposium on Technology and Circuits. Short Course 3(8) (2020)
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R., Eleftheriou, E.: Memory devices and applications for in-memory computing. Nature Nanotechnol. 15(7), 529–544 (2020)
Ielmini, D., Pedretti, G.: Device and circuit architectures for in-memory computing. Adv. Intell. Syst. 2(7), 2000040 (2020)
Jawandhiya, P.: Hardware design for machine learning. Int. J. Artif. Intell. Appl. 9(1), 63–84 (2018)
Dazzi, M., Sebastian, A., Benini, L., Eleftheriou, E.: Accelerating inference of convolutional neural networks using in-memory computing. Front. Comput. Neurosci. 15, 674154 (2021)
Saikia, J., Yin, S., Jiang, Z., Seok, M., Seo, J.: K-nearest neighbor hardware accelerator using in-memory computing sram. In: 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp. 1–6. IEEE (2019)
Dietterich, T.G.: Machine-learning research. AI Mag. 18(4), 97–97 (1997)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Capra, M., Peloso, R., Masera, G., Roch, M.R., Martina, M.: Edge computing: a survey on the hardware requirements in the internet of things world. Future Internet 11(4), 100 (2019)
Kim, J.-W., Kim, D.-S., Kim, S.-H., Shin, S.-M.: The firmware design and implementation scheme for c form-factor pluggable optical transceiver. Appl. Sci. 10(6), 2143 (2020)
Freund, K.: A machine learning landscape: where amd, intel, nvidia, qualcomm and xilinx ai engines live. http://www.forbes.com/sites/moorinsights/2017/03/03, Forbes, 2022. Accessed 23 Sept 2022
Chmielewski, Ł, Weissbart, L.: On reverse engineering neural network implementation on GPU. In: Zhou, J., et al. (eds.) ACNS 2021. LNCS, vol. 12809, pp. 96–113. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81645-2_7
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170 (2015)
Jung, S., Kim, S.: Hardware implementation of a real-time neural network controller with a dsp and an fpga for nonlinear systems. IEEE Trans. Ind. Electron. 54(1), 265–271 (2007)
Sahin, S., Becerikli, Y., Yazici, S.: Neural network implementation in hardware using FPGAs. In: King, I., Wang, J., Chan, L.-W., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 1105–1112. Springer, Heidelberg (2006). https://doi.org/10.1007/11893295_122
Nurvitadhi, E., Sim, J., Sheffield, D., Mishra, A., Krishnan, S., Marr, D.: Accelerating recurrent neural networks in analytics servers: Comparison of fpga, cpu, gpu, and asic. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE (2016)
Boutros, A., Yazdanshenas, S., Betz, V.: You cannot improve what you do not measure: Fpga vs. asic efficiency gaps for convolutional neural network inference. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 11(3), 1–23 (2018)
Kerbl, B., Kenzel, M., Winter, M., Steinberger, M.: Cuda and applications to task-based programming (2022). http://cuda-tutorial.github.io/part2_22.pdf. Accessed 23 Sept 2022
Tarditi, D., Puri, S., Oglesby, J.: Accelerator: using data parallelism to program gpus for general-purpose uses. ACM SIGPLAN Not. 41(11), 325–335 (2006)
Jang, H., Park, A., Jung, K.: Neural network implementation using cuda and openmp. In: 2008 Digital Image Computing: Techniques and Applications, pp. 155–161. IEEE (2008)
Silicon Graphics Khronos Group. Opengl (2022). http://www.opengl.org/. Accessed 23 Sept 2022
Advanced Micro Devices. Amd radeon graphics cards specifications (2022). http://www.amd.com/en/support/kb/faq/gpu-624. Accessed 23 Sept 2022
Nvidia. Cuda toolkit (2022). http://developer.nvidia.com/cuda-zone. Accessed 23 Sept 2022
Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803 (2019)
Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with image transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 32–42 (2021)
Osman, A.A.M.: Gpu computing taxonomy. In: Recent Progress in Parallel and Distributed Computing, IntechOpen (2017)
Ashu Rege. An introduction to modern gpu architecture (nvidia talk). http://download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf
author. Nvidia, gpu (2022). http://www.nvidia.com/en-us/data-center/a100/. Accessed 21 Sept 2022
author. Googlecloud, tpu (2022). http://cloud.google.com/tpu/docs/bfloat16. Accessed 21 Sept 2022
author. Graphcore, ipu (2022). http://www.graphcore.ai/. Accessed 21 Sept 2022
Jia, X., et al.: Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes. arXiv preprint arXiv:1807.11205 (2018)
Goncalo, R., Pedro, T., Nuno, R.: Positnn: training deep neural networks with mixed low-precision posit. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7908–7912 (2021)
Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016)
Sun, X.: Ultra-low precision 4-bit training of deep neural networks. Adv. Neural Inf. Process. Syst. 33, 1796–1807 (2020)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Sun, X., et al.: Hybrid 8-bit floating point (hfp8) training and inference for deep neural networks. In: Advances in Neural Information Processing Systems, 32 (2019)
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. arXiv preprint arXiv:2106.04554 (2021)
Chen, Y., et al.: Mobile-former: bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)
Sebastian, A., et al.: Computational memory-based inference and training of deep neural networks. In: 2019 Symposium on VLSI Technology, pp. T168–T169. IEEE (2019)
Nandakumar, S.R., et al.: Mixed-precision deep learning based on computational memory. Front. Neurosci. 14, 406 (2020)
Yann, L., Corinna, C., Burges Christopher, J.C.: Mnist, dataset (2022). http://yann.lecun.com/exdb/mnist/. Accessed 21 Sept 2022
Wang, C., Gong, L., Qi, Yu., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on fpga. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2016)
Merolla, P., Arthur, J., Akopyan, F., Imam, N., Manohar, R., Modha, D.S.: A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45 nm. In: 2011 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–4. IEEE (2011)
Chen, T., et al.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput. Archit. News 42(1), 269–284 (2014)
Shafiee, A., et al.: Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44(3), 14–26 (2016)
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: a pipelined reram-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552. IEEE (2017)
Chen, Y., Chen, T., Zhiwei, X., Sun, N., Temam, O.: Diannao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)
Mao, H., Song, M., Li, T., Dai, Y., Shu, J.: Lergan: a zero-free, low data movement and pim-based gan architecture. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 669–681. IEEE (2018)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Signal Process. Magazine 35(1), 53–65 (2018)
Salami, B., Unsal, O.S., Kestelman, A.C.: Comprehensive evaluation of supply voltage underscaling in fpga on-chip memories. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 724–736. IEEE (2018)
Makrani, H.M., Sayadi, H., Mohsenin, T., Rafatirad, S., Sasan, A., Homayoun, H.: Xppe: cross-platform performance estimation of hardware accelerators using machine learning. In Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 727–732 (2019)
Song, M., Zhang, J., Chen, H., Li, T.: Towards efficient microarchitectural design for accelerating unsupervised gan-based deep learning. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 66–77. IEEE (2018)
Li, B., Song, L., Chen, F., Qian, X., Chen, Y., Li, H.H.: Reram-based accelerator for deep learning. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 815–820. IEEE (2018)
Chen, Y., et al.: Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE (2014)
Luo, T., et al.: Dadiannao: a neural network supercomputer. IEEE Trans. Comput. 66(1), 73–88 (2016)
Korchagin, P.A., Letopolskiy, A.B., Teterina, I.A.: Results of research of working capability of refined pipelayer equipment. In: International Conference "Aviamechanical Engineering and Transport" (AVENT 2018), pp. 416–420. Atlantis Press (2018)
Qiao, X., Cao, X., Yang, H., Song, L., Li, H.: Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In: Proceedings of the 55th Annual Design Automation Conference, pp. 1–6 (2018)
Liu, D., et al.: Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput. Archit. News 43(1), 369–381 (2015)
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Furber, S.B., Galluppi, F., Temple, S., Plana, L.A.: The spinnaker project. Proc. IEEE 102(5), 652–665 (2014)
Gokmen, T., Haensch, W.: Algorithm for training neural networks on resistive device arrays. Front. Neurosc. 14, 103 (2020)
Wang, C., Gong, L., Li, X., Zhou, X.: A ubiquitous machine learning accelerator with automatic parallelization on fpga. IEEE Trans. Parallel Distrib. Syst. 31(10), 2346–2359 (2020)
Yan, B., et al.: Resistive memory-based in-memory computing: from device and large-scale integration system perspectives. Adv. Intell. Syst. 1(7), 1900068 (2019)
Acknowledgements
The authors gratefully acknowledge financial support DST/INT/Czech/P-12/2019, reg. no. LTAIN19176.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Snášel, V., Dang, T.K., Pham, P.N.H., Küng, J., Kong, L. (2022). In-Memory Computing Architectures for Big Data and Machine Learning Applications. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-8069-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8068-8
Online ISBN: 978-981-19-8069-5
eBook Packages: Computer ScienceComputer Science (R0)