short-paper

Non-relational multi-level caching for mitigation of staleness & stragglers in distributed deep learning

Author:
Aswathy Ravikumar

VIT Chennai, India

VIT Chennai, India
View Profile

Middleware '21: Proceedings of the 22nd International Middleware Conference: Doctoral SymposiumDecember 2021Pages 15–16https://doi.org/10.1145/3491087.3493678

Published:06 December 2021Publication History

Middleware '21: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium

Pages 15–16

ABSTRACT

For efficient distributed deep neural network design, mitigation of stale gradients and stragglers is necessary. The stale gradient problem occurs during the distribution and parallelism of the deep neural networks on the multi-cluster/nodes. The proposed solution for stragglers is to use distributed non-relational database to update the intermediate results of weights and their respective nodes. The results from the database are given to the parameter server. If any delay in the parameter data due to straggling is detected, immediately the straggled data will be configured in another node as a server-less function. In this approach, each node is equipped with distributed in-memory cache and a non-relational database at the parameter server. The parameter server node is an intelligent node working based on a runtime threshold set and the error analysis for fixing the optimal value of 'K' for the K-SGD. The proposed solution for stale data is to efficiently utilize the multiple GPU with multiple levels of caching in the cloud for better performance and reduced response time. Response time is reduced by offloading and pushing data close to the nodes in multiple levels of distributed cache. Using GPU cache and Elastic Cache, data is updated to the individual nodes in optimal time intervals. Thus, an integrated solution for stragglers and staleness in both data-parallel and model parallel distributed deep learning networks.

References

Assran, M., Loizou, N., Ballas, N. & Rabbat, M. (2019). Stochastic Gradient Push for Distributed Deep Learning., Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning esearch.Google Scholar
Johann Schleier-Smith. Serverless foundations for elastic database systems. CIDR, 2019.Google Scholar
Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. Sand: Towards high-performance serverless computing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 923--935, 2018. Google ScholarDigital Library
H. Xu, C.-Y. Ho, A. M. Abdelmoniem, A. Dutta, E. H. Bergou, K. Karatsenidis, M. Canini, and P. Kalnis, "Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation", Tech. Rep., 2020Google Scholar

Index Terms

Non-relational multi-level caching for mitigation of staleness & stragglers in distributed deep learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Rank-Based Prefetching and Multi-level Caching Algorithms to Improve the Efficiency of Read Operations in Distributed File Systems
Big Data Analytics
Abstract
In the era of big data, web-based applications deployed in cloud computing systems have to store and process large data generated by the users of such applications. Distributed file systems are used as the back end storage component in the cloud ...
Read More
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi [1] as an ...
Read More
Distributed cooperative shared last-level caching in tiled multiprocessor system on chip
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

In a shared-memory based tiled many-core system-on-chip architecture, memory accesses present a huge performance bottleneck in terms of access latency as well as bandwidth requirements. The best practice approach to address this issue is to provide a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Middleware '21: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium
December 2021
38 pages
ISBN:9781450391559
DOI:10.1145/3491087
General Chairs:
Kaiwen Zhang
ÉTS, Canada
,
Abdelouahed Gherbi
ÉTS, Canada
,
Program Chairs:
Nalini Venkatasubramanian
UC Irvine
,
Luís Veiga
IST (U.Lisboa) & INESC-ID, Portugal
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 December 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed deep learning
elastic cache
multi-level caching
non-relational database
staleness
stragglers
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate203of948submissions,21%

Upcoming Conference

MIDDLEWARE '24

25th International Middleware Conference

December 2 - 6, 2024

Hong Kong , Hong Kong
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 62
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Non-relational multi-level caching for mitigation of staleness & stragglers in distributed deep learning

Middleware '21: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rank-Based Prefetching and Multi-level Caching Algorithms to Improve the Efficiency of Read Operations in Distributed File Systems

Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Distributed cooperative shared last-level caching in tiled multiprocessor system on chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Non-relational multi-level caching for mitigation of staleness & stragglers in distributed deep learning

Middleware '21: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rank-Based Prefetching and Multi-level Caching Algorithms to Improve the Efficiency of Read Operations in Distributed File Systems

Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

Distributed cooperative shared last-level caching in tiled multiprocessor system on chip

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media