Abstract:
Existing GPU-based approaches cannot yet meet the performance requirement for training very large convolutional neural networks (CNNs), where convolutional layers (Conv-l...Show MoreMetadata
Abstract:
Existing GPU-based approaches cannot yet meet the performance requirement for training very large convolutional neural networks (CNNs), where convolutional layers (Conv-layers) dominate the training time. In this paper, we find that no single convolution policy can always perform the fastest across all the computing phases. Then, we propose an approach called HyConv to accelerating multi-phase CNN computation by fine-grained policy selection. HyConv encapsulates existing convolution policies into a set of modules, and selects the fastest policy (a.k.a., winner policy) via one-round runtime measurement for computing each phase. Furthermore, HyConv uses a winner database to record the current winner policies, avoiding duplicate measurement later for the same parameter configuration. Our experimental results indicate that over all the used real-world CNN networks, HyConv consistently outperforms existing approaches on either a single GPU or four GPUs, with speedups of up to 3.3× and up to 1.6× over cuDNN-MM respectively. Such improvement can be explained by our result that HyConv delivers obviously better performance for most of single Conv-layers. Furthermore, HyConv has the ability to work with any parameter configuration and thus keeps better usability.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 30, Issue: 2, 01 February 2019)