Skip to main content
Log in

SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have shown tremendous progress and performance in recent years. Since emergence, CNNs have exhibited excellent performance in most of classification and segmentation tasks. Currently, the CNN family includes various architectures that dominate major vision-based recognition tasks. However, building a neural network (NN) by simply stacking convolution blocks inevitably limits its optimization ability and introduces overfitting and vanishing gradient problems. One of the key reasons for the aforementioned issues is network singularities, which have lately caused degenerating manifolds in the loss landscape. This situation leads to a slow learning process and lower performance. In this scenario, the skip connections turned out to be an essential unit of the CNN design to mitigate network singularities. The proposed idea of this research is to introduce skip connections in NN architecture to augment the information flow, mitigate singularities and improve performance. This research experimented with different levels of skip connections and proposed the placement strategy of these links for any CNN. To prove the proposed hypothesis, we designed an experimental CNN architecture, named as Shallow Wide ResNet or SRNet, as it uses wide residual network as a base network design. We have performed numerous experiments to assess the validity of the proposed idea. CIFAR-10 and CIFAR-100, two well-known datasets are used for training and testing CNNs. The final empirical results have shown a great many of promising outcomes in terms of performance, efficiency and reduction in network singularities issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Krizhevsky A, Ilya S, Geoffrey E H. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.

  2. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg A C. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.

    Article  MathSciNet  Google Scholar 

  3. LeCun Y, Yoshua B, Geoffrey E H. Deep learning. Nature, 2015, 521(7553): 436-444.

    Article  Google Scholar 

  4. Zou W Y, Wang X, Sun M, Lin Y. Generic object detection with dense neural patterns and regionlets. arXiv:1404.4316, 2014. https://arxiv.org/abs/1404.4316, July 2018.

  5. Lin M, Chen Q, Yan S. Network in network. arXiv: 13-12.4400, 2013. https://arxiv.org/abs/1312.4400, July 2018.

  6. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, July 2018.

  7. Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, July 2018.

  8. Yasrab R. ECRU: An encoder-decoder based convolution neural network (CNN) for road-scene understanding. Journal of Imaging, 2018, 4(10): Article No. 116.

  9. Yasrab R, Gu N, Zhang X. SCNet: A simplified encoder-decoder CNN for semantic segmentation. In Proc. the 5th International Conference on Computer Science and Network Technology, December 2016, pp.785-789.

  10. Yasrab R, Gu N, Zhang X. An encoder-decoder based convolution neural network (CNN) for future advanced driver assistance system (ADAS). Applied Sciences, 2017, 7(4): Article No. 312.

  11. Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In Proc. the 30th International Conference on Machine Learning, June 2013, pp.1139-1147.

  12. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.

  13. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1026-1034.

  14. Lee C Y, Xie S, Gallagher P, Zhang Z, Tu Z. Deeply-supervised nets. In Proc. the 18th International Conference on Artificial Intelligence and Statistics, May 2015, pp.562-570.

  15. Raiko T, Valpola H, LeCun Y. Deep learning made easier by linear transformations in perceptrons. In Proc. the 15th International Conference on Artificial Intelligence and Statistics, April 2012, pp.924-932.

  16. Schmidhuber J. Learning complex, extended sequences using the principle of history compression. Neural Computation, 1992, 4(2): 234-242.

    Article  Google Scholar 

  17. Chen T, Goodfellow I, Shlens J. Net2net: Accelerating learning via knowledge transfer. arXiv:1511.05641, 2015. https://arxiv.org/abs/1511.05641, November 2018.

  18. Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. arXiv: 1412.6-550, 2014. https://arxiv.org/abs/1412.6550, July 2018.

  19. Wei H, Zhang J, Cousseau F, Ozeki T, Amari S. Dynamics of learning near singularities in layered networks. Neural Computation, 2008, 20(3): 813-843.

    Article  MathSciNet  MATH  Google Scholar 

  20. Amari S I, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Computation, 2006, 18(5), 1007-1065.

    Article  MathSciNet  MATH  Google Scholar 

  21. Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013. https://arxiv.org/abs/1312.6120, August 2018.

  22. Orhan A E, Pitkow X. Skip connections eliminate singularities. arXiv:1701.09175, 2017. https://arxiv.org/abs/17-01.09175, September 2018.

  23. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.

  24. Huang G, Sun Y, Liu Z, Sedra D, Weinberger K Q. Deep networks with stochastic depth. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.646-661.

  25. He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.630-645.

  26. Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387, June 2018.

  27. Zhang K, Sun M, Han X, Yuan X, Guo L, Liu T. Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6): 1303-1314.

    Article  Google Scholar 

  28. Zhang K, Guo L, Gao C, Zhao Z. Pyramidal RoR for image classification. arXiv:1710.00307, 2017. https://arxiv.org/abs/1710.00307, May 2018.

  29. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.

  30. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.

    Article  Google Scholar 

  31. Shen F, Gan R, Zeng G. Weighted residuals for very deep networks. In Proc. the 3rd International Conference on Systems and Informatics, November 2016, pp.936-941.

  32. Bengio Y, LeCun Y. Scaling learning algorithms towards AI. In Large-Scale Kernel Machines, Bottou L, Chapelle O, DeCoste D, Weston J (eds.), MIT Press, 2017.

  33. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.473-480.

  34. Zagoruyko S, Komodakis N. Wide residual networks. arXiv:1605.07146, 2016. https://arxiv.org/abs/1605.07146, January 2019.

  35. Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

    MathSciNet  MATH  Google Scholar 

  36. Huang G, Liu Z, Weinberger K Q, Maaten L. Densely connected convolutional networks. arXiv:1608.06993, 2016. https://arxiv.org/abs/1608.06993, September 2018.

  37. Han D, Kim J, Kim J. Deep pyramidal residual networks. arXiv:1610.02915, 2016. https://arxiv.org/abs/1610.02915, July 2018.

  38. Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5987-5995.

  39. Szegedy C, Loffe S, Vanhoucke V, Alemi A A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4278-4284.

  40. Loffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.448-456.

  41. Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.807-814.

  42. Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580, July 2018.

  43. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.675-678.

  44. LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4): 541-551.

    Article  Google Scholar 

  45. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.525-542.

  46. Sheen S, Lyu J. Median binary-connect method and a binary convolutional neural network for word recognition. arXiv:1811.02784v1, 2018. https://arxiv.org/abs/18-11.02784v1, December 2018.

  47. Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.344-352.

  48. Juefei-Xu F, Boddeti V N, Savvides M. Local binary convolutional neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.4284-4293.

Download references

Acknowledgement(s)

I would like to acknowledge Tony Pridmore, Michael Pound, Khan Faraz, Mohammadreza Soltaninejad and John Atanbori of Computer Vision Laboratory, School of Computer Science, University of Nottingham, for insightful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robail Yasrab.

Electronic supplementary material

ESM 1

(PDF 654 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yasrab, R. SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities. J. Comput. Sci. Technol. 34, 924–938 (2019). https://doi.org/10.1007/s11390-019-1950-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-019-1950-8

Keywords

Navigation