Journals & Magazines >IEEE Transactions on Computers >Volume: 73 Issue: 2

MMDataLoader: Reusing Preprocessed Data Among Concurrent Model Training Tasks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Data preprocessing plays an important role in deep learning, which directly affects the training efficiency. Data preprocessing is performed on the CPU. The preprocessed ...Show More

Metadata

Abstract:

Data preprocessing plays an important role in deep learning, which directly affects the training efficiency. Data preprocessing is performed on the CPU. The preprocessed data are then fed to the models that are trained on the GPU. We observe that data preprocessing on the CPU can potentially create a bottleneck in the entire process of a model training task. In order to tackle this issue, we have developed MMDataLoader, which enables reusing preprocessed data among multiple model training tasks. MMDataLoader automatically constructs a data preprocessing pipeline based on each task's specific preprocessing workflow, allowing for maximum data reuse and reduced computing workload on the CPU. Unlike conventional data loaders that operate at the task level and provide data provision services to specific training tasks, MMDataLoader operates at the server level and provides data for all concurrently running tasks. We have conducted extensive experiments. The results show that MMDataLoader can significantly increase preprocessing throughput without affecting model convergence when compared to conventional methods where model training tasks are executed concurrently. For instance, with three tasks running, the preprocessing throughput can increase by 1.6x to 3.15x, depending on the tasks being executed and the proportion of preprocessing operations that are shared among them.

Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 2, February 2024)

Page(s): 510 - 522

Date of Publication: 23 November 2023

ISSN Information:

DOI: 10.1109/TC.2023.3336161

Funding Agency:

Contents

References is not available for this document.

MMDataLoader: Reusing Preprocessed Data Among Concurrent Model Training Tasks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MMDataLoader: Reusing Preprocessed Data Among Concurrent Model Training Tasks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?