Conferences >2024 IEEE International Sympo...

Accelerating Large-Scale DLRM Inference through Dynamic Hot Data Rearrangement

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep learning recommendation systems, such as Facebook’s DLRM, enhance user experiences by providing personalized recommendations on social platforms. The use of CXL-base...Show More

Metadata

Abstract:

Deep learning recommendation systems, such as Facebook’s DLRM, enhance user experiences by providing personalized recommendations on social platforms. The use of CXL-based memory extension is gaining attention while existing server DRAM capacity is not sufficient for huge memory requirements. Typically, frequently accessed hot embedding data is stored in local memory, whereas occasionally accessed cold embedding data resides in CXL memory. The distinction between hot and cold data is based on the training results. However, the characteristics of hot and cold embedding vectors can change between training sessions, posing challenges for consistent inference latency with increasing model sizes. This study explores techniques for accelerating large-scale DLRM inference through dynamic hot data rearrangement. The proposed hotness score-based page promotion involves periodic page promotion and demotion based on the changing hotness of embedding data. Additionally, a prioritizing cache prefetch based on hotness improves cache temporal locality, especially in multi-user scenarios. Simulation results demonstrate that proposed approaches is able to enhance DLRM inference speed by up to 8.65% compared to existing techniques.

Published in: 2024 IEEE International Symposium on Circuits and Systems (ISCAS)

Date of Conference: 19-22 May 2024

Date Added to IEEE Xplore: 02 July 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ISCAS58744.2024.10558132

Conference Location: Singapore, Singapore

Contents

References is not available for this document.

Accelerating Large-Scale DLRM Inference through Dynamic Hot Data Rearrangement

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Accelerating Large-Scale DLRM Inference through Dynamic Hot Data Rearrangement

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?