Skip to main content
Log in

Distributed deep learning system for cancerous region detection on Sunway TaihuLight

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

To explore the potential of distributed training on deep neural networks, we implement several distributed algorithms with the basis of swFlow on the world-leading supercomputer, Sunway TaihuLight. Based on two naive designs of parameter server and ring all-reduce, we present the limitation of the communication model and discuss the optimizations for adapting the five-level interconnect architecture of Sunway system. To reduce the communication bottleneck on large scale system, multi-severs and hierarchical ring all-reduce models are introduced. With a benchmark from deep learning-based cancerous region detection algorithm, the average parallel efficiency obtains over 80% for at most 1024 processors. It reveals the great opportunity for joint combination of deep learning and HPC system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The First Affiliated Hospital of University of Science and Technology of China.

  2. https://www.computationalpathologygroup.eu/.

  3. https://camelyon16.grand-challenge.org/.

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 16), pp. 265–283 (2016)

  • Akiba, T., Fukuda, K., Suzuki, S.: ChainerMN: scalable distributed deep learning framework (2017a). arXiv preprint. arXiv:1710.11351

  • Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes (2017b). arXiv preprint. arXiv:1711.04325

  • Araújo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy, C., Polónia, A., Campilho, A.: Classification of breast cancer histology images using convolutional neural networks. PloS One 12(6), e0177544 (2017)

    Article  Google Scholar 

  • Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010 (2010). Springer, Berlin, pp. 177–186

  • Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A.: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68(6), 394–424 (2018)

    Article  Google Scholar 

  • Cruz-Roa, A., Basavanhally, A., González, F., Gilmore, H., Feldman, M., Ganesan, S., Shih, N., Tomaszewski, J., Madabhushi, A.: Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In: Medical Imaging 2014: Digital Pathology, vol. 9041 (2014). International Society for Optics and Photonics, Bellingham, p. 904103

  • Dongarra, J.: Report on the Sunway TaihuLight system. University of Tennessee computer science Technical Report UT-EECS-16-742 (2016). University of Tennessee. http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016.pdf

  • Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., Yang, G.: swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, New York, pp. 615–624 (2017)

  • Ghaznavi, F., Evans, A., Madabhushi, A., Feldman, M.: Digital imaging in pathology: whole-slide imaging and beyond. Annu. Rev. Pathol. Mech. Dis. 8, 331–359 (2013)

    Article  Google Scholar 

  • Gibiansky, A.: Bringing HPC techniques to deep learning (2017). http://research.baidu.com/bringing-hpc-techniques-deep-learning

  • Hou, L., Samaras, D., Kurc, T.M., Gao, Y., Davis, J.E., Saltz, J.H.: Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016 (2016), pp. 2424–2433

  • Jia, X., Song, S., He, W., Wang, Y., Rong, H., Zhou, F., Xie, L., Guo, Z., Yang, Y., Yu, L., et al.: Highly scalable deep learning training system with mixed-precision: training ImageNet in four minutes (2018). arXiv preprint. arXiv:1807.11205

  • Kurth, T., Treichler, S., Romero, J., Mudigonda, M., Luehr, N., Phillips, E., Mahesh, A., Matheson, M., Deslippe, J., Fatica, M., et al.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 51. IEEE Press, New York (2018)

  • Li, L., Fang, J., Fu, H., Jiang, J., Zhao, W., He, C., You, X., Yang, G.: swCaffe: a parallel framework for accelerating deep learning applications on Sunway TaihuLight. In: IEEE International Conference on Cluster Computing (CLUSTER) (2018), pp. 413–422. IEEE, New York (2018)

  • Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G.E., Kohlberger, T., Boyko, A., Venugopalan, S., Timofeev, A., Nelson, P.Q., Corrado, G.S., et al.: Detecting cancer metastases on gigapixel pathology images. arXiv preprint. arXiv:1703.02442

  • Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015 (2015), pp. 3431–3440

  • Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)

    Article  Google Scholar 

  • Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention (2015). Springer, Berlin, pp. 234–241

  • Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow (2018). arXiv preprint. arXiv:1802.05799

  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015 (2015), pp. 1–9

  • Virchow, A.E.R.: Doctor, statesman, anthropologist. University of Wisconsin Press, Madison (1953)

    Google Scholar 

  • Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying metastatic breast cancer (2016). arXiv preprint. arXiv:1606.05718

  • Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems (2015), pp. 685–693

Download references

Acknowledgements

The work is supported by the National Key Research and Development Program of China (Grants no. 2016YF B1000403) and the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to GuoFeng Lv.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lv, G., Li, M., An, H. et al. Distributed deep learning system for cancerous region detection on Sunway TaihuLight. CCF Trans. HPC 2, 348–361 (2020). https://doi.org/10.1007/s42514-020-00046-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00046-5

Keywords

Navigation