skip to main content
10.1145/3489517.3530478acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

iMARS: an in-memory-computing architecture for recommendation systems

Published: 23 August 2022 Publication History

Abstract

Recommendation systems (RecSys) suggest items to users by predicting their preferences based on historical data. Typical RecSys handle large embedding tables and many embedding table related operations. The memory size and bandwidth of the conventional computer architecture restrict the performance of RecSys. This work proposes an in-memory-computing (IMC) architecture (iMARS) for accelerating the filtering and ranking stages of deep neural network-based RecSys. iMARS leverages IMC-friendly embedding tables implemented inside a ferroelectric FET based IMC fabric. Circuit-level and system-level evaluation show that iMARS achieves 16.8x (713x) end-to-end latency (energy) improvement compared to the GPU counterpart for the MovieLens dataset.

References

[1]
M. Naumov, et al. Deep learning recommendation model for personalization and recommendation systems. CoRR, abs/1906.00091, 2019.
[2]
P. Covington, et al. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, 2016.
[3]
H.-J. M. Shi, et al. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In SIGKDD, 2020.
[4]
L. Ke, et al. Recnmp: Accelerating personalized recommendation with near-memory processing. In 2020 ACM/IEEE 47th ISCA, pages 790--803. IEEE, 2020.
[5]
K. Ni, et al. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics, 2(11):521--529, 2019.
[6]
A. Ranjan, et al. X-mann: A crossbar based architecture for memory augmented neural networks. In DAC, pages 1--6, 2019.
[7]
D. Reis, et al. Computing in memory with fefets. In ISLPED, 2018.
[8]
X. Zhang, et al. FeMAT: Exploring In-Memory Processing in Multifunctional FeFET-Based Memory Array. In ICCD, pages 541--549, 2019.
[9]
D. Reis, et al. Attention-in-Memory for Few-Shot Learning with Configurable Ferroelectric FET Arrays. In 26th ASPDAC.
[10]
S. Dunkel, et al. A FeFET based super-low-power ultra-fast embedded NVM technology for 22nm FDSOI and beyond. In IEDM, 2017.
[11]
F. M. Harper et al. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1--19, 2015.
[12]
W. Jiang, et al. Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters. In 27th ACM SIGKDD, 2021.
[13]
W. Jiang, et al. Microrec: Efficient recommendation inference by hardware and data structure solutions, 2021.
[14]
A. Shafiee, et al. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ACM/IEEE 43rd ISCA, pages 14--26, 2016.
[15]
S. Jeloka, et al. A 28 nm configurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory. JSSC, 51(4), 2016.
[16]
X. Chen, et al. The impact of ferroelectric fets on digital and analog circuits and architectures. IEEE Design & Test, 37(1):79--99, 2019.
[17]
S. Beyer, et al. Fefet: A versatile cmos compatible device with game-changing potential. In 2020 IEEE International Memory Workshop (IMW), pages 1--4, 2020.
[18]
Y. Luo, et al. Mlp+neurosimv3.0: Improving on-chip learning performance with device to algorithm optimizations. In ISCA, ICONS '19, New York, NY, USA, 2019. Association for Computing Machinery.
[19]
K. Ni, et al. A circuit compatible accurate compact model for Ferroelectric FETs. In VLSI Symposium. IEEE, 2018.
[20]
Y. Cao, et al. Predictive technology model. Internet: http://ptm.asu.edu, 2002.
[21]
J. Knudsen. Nangate 45nm open cell library. CDNLive, EMEA, 2008.
[22]
P.-Y. Chen, et al. Neurosim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. TCAD, 37(12):3067--3080, 2018.

Cited By

View all
  • (2025)Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353953436:4(745-758)Online publication date: Apr-2025
  • (2025)Cross-Layer Modeling and Design of Content Addressable Memories in Advanced Technology Nodes for Similarity SearchIEEE Transactions on Electron Devices10.1109/TED.2024.350650972:1(240-246)Online publication date: Jan-2025
  • (2025)Combination-Encoding Content-Addressable Memory Utilizing the Ferroelectric Hf-Zr-O Field-Effect-Transistor ArrayACS Applied Electronic Materials10.1021/acsaelm.4c02180Online publication date: 6-Mar-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)215
  • Downloads (Last 6 weeks)35
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353953436:4(745-758)Online publication date: Apr-2025
  • (2025)Cross-Layer Modeling and Design of Content Addressable Memories in Advanced Technology Nodes for Similarity SearchIEEE Transactions on Electron Devices10.1109/TED.2024.350650972:1(240-246)Online publication date: Jan-2025
  • (2025)Combination-Encoding Content-Addressable Memory Utilizing the Ferroelectric Hf-Zr-O Field-Effect-Transistor ArrayACS Applied Electronic Materials10.1021/acsaelm.4c02180Online publication date: 6-Mar-2025
  • (2024)Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, & Compilers2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546772(1-10)Online publication date: 25-Mar-2024
  • (2024)FeReX: A Reconfigurable Design of Multi-Bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546615(1-6)Online publication date: 25-Mar-2024
  • (2024)Accelerating Recommendation Systems With In-Memory Embedding OperationsIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34878171:2(244-256)Online publication date: Dec-2024
  • (2024)Hardware design and the fairness of a neural networkNature Electronics10.1038/s41928-024-01213-07:8(714-723)Online publication date: 25-Jul-2024
  • (2023)DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137068(1-6)Online publication date: Apr-2023
  • (2023)In-Memory Computing Accelerators for Emerging Learning ParadigmsProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3568356(606-611)Online publication date: 16-Jan-2023
  • (2023)Design of a Compact Spin-Orbit-Torque-Based Ternary Content Addressable MemoryIEEE Transactions on Electron Devices10.1109/TED.2022.323156970:2(506-513)Online publication date: Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media