skip to main content
10.1145/3292500.3330756acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Published: 25 July 2019 Publication History

Abstract

Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.

References

[1]
Mart'in Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.
[2]
Soumith Chintala Adam Paszke, Sam Gross and Gregory Chanan. {n. d.}. PyTorch. https://github.com/pytorch/pytorch.
[3]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
[4]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR, Vol. abs/1512.01274 (2015). arxiv: 1512.01274
[5]
Valeriu Codreanu, Damian Podareanu, and Vikram Saletore. 2017. Achieving Deep Learning Training in less than 40 Minutes on ImageNet-1K. https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-φ-in-less-than-40-minutes/.
[6]
R. Collobert. 2008. Torch. NIPS Workshop on Machine Learning Open Source Software.
[7]
Xavier Martorell David Oro, Carles Fernandez and Javier Hernando. 2016. Work-Efficient Parallel non-maximum suppression for embedded GPU architecture. ICASSP (2016).
[8]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marctextquotesingle aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1223--1231.
[9]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters, In OSDI 2004. OSDI '04, 137--150.
[10]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09 .
[11]
et. al. Dougal Maclaurin. {n. d.}. Autograd. https://github.com/HIPS/autograd
[12]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. IJCV, Vol. 88, 2 (June 2010), 303--338.
[13]
Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Fr?d?ric Bastien, and Yoshua Bengio. 2013. Pylearn2: a machine learning research library. CoRR, Vol. abs/1308.4214 (2013).
[14]
Priya Goyal, Piotr Dollá r, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR, Vol. abs/1706.02677 (2017).
[15]
Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Semantic Contours from Inverse Detectors. In ICCV .
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[17]
Xiaojuan Qi Xiaogang Wang Jiaya Jia Hengshuang Zhao, Jianping Shi. 2017. Pyramid Scene Parsing Network. CVPR (2017).
[18]
Matthew Honnibal and Ines Montani. {n. d.}. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. ({n. d.}). https://spacy.io/
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. CVPR .
[20]
Yangqing Jia. 2013. Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding.
[21]
Eric Jones, Travis Oliphant, Pearu Peterson, et almbox. 2001--. SciPy: Open source scientific tools for Python . http://www.scipy.org/
[22]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. ICLR (2014).
[23]
Andreas Klöckner, Nicolas Pinto, Yunsup Lee, B. Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation . Parallel Comput., Vol. 38, 3 (2012), 157--174.
[24]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521 (2015), 436--444.
[25]
Tsung-Yi Lin, Piotr Dollár, Ross B Girshick, Kaiming He, Bharath Hariharan, and Serge J Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR, Vol. 1. 3.
[26]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[27]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-yang Fu, and Alexander C Berg. 2016. SSD: Single Shot MultiBox Detector . arXiv preprint arXiv:1512.02325v2 (2016).
[28]
Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth Bernt Schiele Marius Cordts, Mohamed Omran. 2017. The Cityscapes Dataset for Semantic Urban Scene Understanding. CVPR (2017).
[29]
Tomá vs Mikolov, Martin Karafiá t, Luká vs Burget, Jan vC ernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH. 1045--1048.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. NIPS (2013), 3111--3119.
[31]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. NIPS Deep Learning Workshop.
[32]
Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein Alexander C. Berg Li Fei-Fei Olga Russakovsky, Jia Deng. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV (2015).
[33]
Travis Oliphant. 2006. Guide to NumPy. Trelgol Publishing. http://www.tramy.us/numpybook.pdf
[34]
Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. 2017. Revisiting Distributed Synchronous SGD. ICLR Workshop Track, 2016 (02 2017).
[35]
Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016).
[36]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018).
[37]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91--99.
[38]
Jacob Schreiber. 2017. Pomegranate: fast and flexible probabilistic modeling in python. CoRR, Vol. abs/1711.00137 (2017).
[39]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR, Vol. abs/1802.05799 (2018). arxiv: 1802.05799 http://arxiv.org/abs/1802.05799
[40]
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, and Blake Hechtman. 2018. Mesh-TensorFlow: Deep Learning for Supercomputers. In Neural Information Processing Systems .
[41]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, Vol. abs/1409.1556 (2014).
[42]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. NIPS (2014), 3104--3112.
[43]
Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human Pose Estimation via Deep Neural Networks. In CVPR. 1653--1660.
[44]
Oriol Vinyals and Quoc V. Le. 2015. A Neural Conversational Model. CoRR, Vol. abs/1506.05869 (2015). http://dblp.uni-trier.de/db/journals/corr/corr1506.html#VinyalsL15
[45]
Jifeng Dai Xiangyang Ji Yichen Wei Yi Li, Haozhi Qi. 2017. Fully Convolutional Instance-aware Semantic Segmentation. CVPR (2017).
[46]
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. 2017. ImageNet Training in Minutes. CoRR, Vol. abs/1709.05011 (2017).
[47]
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Zhiheng Huang, Brian Guenter, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jie Gao, Andreas Stolcke, Jon Currey, Malcolm Slaney, Guoguo Chen, Amit Agarwal, Chris Basoglu, Marko Padmilac, Alexey Kamenev, Vladimir Ivanov, Scott Cypher, Hari Parthasarathi, Bhaskar Mitra, Baolin Peng, and Xuedong Huang. 2014. An Introduction to Computational Networks and the Computational Network Toolkit . Technical Report.

Cited By

View all
  • (2025)Using Machine Learning Hardware to Solve Linear Partial Differential Equations with Finite Difference MethodsInternational Journal of Parallel Programming10.1007/s10766-025-00791-653:2Online publication date: 4-Mar-2025
  • (2024)Perspektywy zastosowania sztucznej inteligencji i potencjał technik multimedialnych w kształceniu kadr wymiaru sprawiedliwościKwartalnik Krajowej Szkoły Sądownictwa i Prokuratury10.53024/7.4.56.2024(99-113)Online publication date: 31-Dec-2024
  • (2024)Optical Imaging Model Based on GPU-Accelerated Monte Carlo Simulation for Deep-Sea Luminescent ObjectsRemote Sensing10.3390/rs1613242916:13(2429)Online publication date: 2-Jul-2024
  • Show More Cited By

Index Terms

  1. Chainer: A Deep Learning Framework for Accelerating the Research Cycle

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computer vision
    2. deep learning frameworks
    3. distributed training
    4. gpu computing

    Qualifiers

    • Research-article

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Using Machine Learning Hardware to Solve Linear Partial Differential Equations with Finite Difference MethodsInternational Journal of Parallel Programming10.1007/s10766-025-00791-653:2Online publication date: 4-Mar-2025
    • (2024)Perspektywy zastosowania sztucznej inteligencji i potencjał technik multimedialnych w kształceniu kadr wymiaru sprawiedliwościKwartalnik Krajowej Szkoły Sądownictwa i Prokuratury10.53024/7.4.56.2024(99-113)Online publication date: 31-Dec-2024
    • (2024)Optical Imaging Model Based on GPU-Accelerated Monte Carlo Simulation for Deep-Sea Luminescent ObjectsRemote Sensing10.3390/rs1613242916:13(2429)Online publication date: 2-Jul-2024
    • (2024)Reconstructing High Dynamic Range Image from a Single Low Dynamic Range Image Using Histogram LearningApplied Sciences10.3390/app1421984714:21(9847)Online publication date: 28-Oct-2024
    • (2024)Distributed Training of Large Language Models on AWS TrainiumProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698535(961-976)Online publication date: 20-Nov-2024
    • (2024)PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph CompilationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640366(929-947)Online publication date: 27-Apr-2024
    • (2024) dxtb —An efficient and fully differentiable framework for extended tight-binding The Journal of Chemical Physics10.1063/5.0216715161:6Online publication date: 9-Aug-2024
    • (2024)LLM-Commentator: Novel fine-tuning strategies of large language models for automatic commentary generation using football event dataKnowledge-Based Systems10.1016/j.knosys.2024.112219300(112219)Online publication date: Sep-2024
    • (2024)Big data analytics deep learning techniques and applications: A surveyInformation Systems10.1016/j.is.2023.102318120(102318)Online publication date: Feb-2024
    • (2024)Machine learning heralding a new development phase in molecular dynamics simulationsArtificial Intelligence Review10.1007/s10462-024-10731-457:4Online publication date: 29-Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media