Skip to main content

LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

The 2D convolutional (Conv2d) layer is the fundamental element to a deep convolutional neural network (CNN). Despite the great success of CNN, the conventional Conv2d is still limited in effectively reducing the spatial and channel-wise redundancy of features. In this paper, we propose to mitigate this issue by learning a CNN with a learnable sparse transform (LST), which converts the input features into a more compact and sparser domain so that the spatial and channel-wise redundancy can be more effectively reduced. The proposed LST can be efficiently implemented with existing CNN modules, such as point-wise and depth-wise separable convolutions, and it is portable to existing CNN architectures for seamless training and inference. We further present a hybrid soft thresholding and ReLU (ST-ReLU) activation scheme, making the trained network, namely LST-Net, more robust to image corruptions at the inference stage. Extensive experiments on CIFAR-10/100, ImageNet, ImageNet-C and Places365-Standard datasets validated that the proposed LST-Net can obtain even higher accuracy than its counterpart networks with fewer parameters and less overhead.

L. Li and K. Wang—The first two authors contribute equally in this work.

This work is supported by HK RGC General Research Fund (PolyU 152216/18E).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, J.F., Dong, B., Osher, S., Shen, Z.: Image restoration: total variation, wavelet frames, and beyond. J. Am. Math. Soc. 25(4), 1033–1089 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Candes, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Candes, E.J., Wakin, M.B., Boyd, S.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)

    MathSciNet  MATH  Google Scholar 

  4. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)

  5. Chang, T., Kuo, C.C.: Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. Image Process. 2(4), 429–441 (1993)

    Article  Google Scholar 

  6. Chen, W., Xie, D., Zhang, Y., Pu, S.: All you need is a few shifts: designing efficient convolutional neural networks for image classification. In: Proceedings of the CVPR (2019)

    Google Scholar 

  7. Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the ICLR (2016)

    Google Scholar 

  8. Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the ICCV (2017)

    Google Scholar 

  9. Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, X.P.: Second-order attention network for single image super-resolution. In: Proceedings of the CVPR (2019)

    Google Scholar 

  10. Denève, S., Alemi, A., Bourdoukan, R.: The brain as an efficient and robust adaptive learner. Neuron 94(5), 969–977 (2017)

    Article  Google Scholar 

  11. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the CVPR. IEEE (2009)

    Google Scholar 

  12. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Donoho, D.L., Johnstone, J.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  14. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  15. Fracastoro, G., Fosson, S.M., Magli, E.: Steerable discrete cosine transform. IEEE Trans. Image Process. 26(1), 303–314 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the CVPR (2019)

    Google Scholar 

  17. Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the ICCV (2015)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  20. He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the ICCV (2017)

    Google Scholar 

  21. Heil, C., Walnut, D.F.: Continuous and discrete wavelet transforms. SIAM Rev. 31(4), 628–666 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  22. Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: Proceedings of the ICLR (2019)

    Google Scholar 

  23. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. In: Proceedings of the CVPR (2017)

    Google Scholar 

  24. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the CVPR (2018)

    Google Scholar 

  25. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07–49. University of Massachusetts, Amherst (2007)

    Google Scholar 

  26. Huang, K., Aviyente, S.: Wavelet feature selection for image classification. IEEE Trans. Image Process. 17(9), 1709–1720 (2008)

    Article  MathSciNet  Google Scholar 

  27. Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. 148(3), 574–591 (1959)

    Article  Google Scholar 

  28. Huys, R., Jirsa, V.K., Darokhan, Z., Valentiniene, S., Roland, P.E.: Visually evoked spiking evolves while spontaneous ongoing dynamics persist. Front. Syst. Neurosci. 9, 183 (2016)

    Article  Google Scholar 

  29. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the ICML (2015)

    Google Scholar 

  30. Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the CVPR (2016)

    Google Scholar 

  31. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Proceedings of the NeurIPS (2017)

    Google Scholar 

  32. Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014)

  33. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report. University of Toronto (2009)

    Google Scholar 

  34. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the NeurIPS (2012)

    Google Scholar 

  35. Li, S., Yang, L., Huang, J., Hua, X.S., Zhang, L.: Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the ICCV (2019)

    Google Scholar 

  36. Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the ICLR (2014)

    Google Scholar 

  37. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8

    Chapter  Google Scholar 

  38. Maas, A., Hannun, A., Ng, A.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML (2013)

    Google Scholar 

  39. Makhoul, J.: A fast cosine transform in one and two dimensions. IEEE Trans. Acoust. Speech Signal Process. 28(1), 27–34 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  40. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the ICML (2010)

    Google Scholar 

  41. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)

    Article  Google Scholar 

  42. Paszke, A., et al.: Automatic differentiation in PyTorch. In: Proceedings of the NeurIPS-W (2017)

    Google Scholar 

  43. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the CVPR (2016)

    Google Scholar 

  44. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the NeurIPS (2015)

    Google Scholar 

  45. Rioul, O., Duhamel, P.: Fast algorithms for discrete and continuous wavelet transforms. IEEE Trans. Inf. Theory 38(2), 569–586 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  46. Roland, P.E.: Space-time dynamics of membrane currents evolve to shape excitation, spiking, and inhibition in the cortex at small and large scales. Neuron 94(5), 934–942 (2017)

    Article  Google Scholar 

  47. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the CVPR (2018)

    Google Scholar 

  48. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the ICLR (2015)

    Google Scholar 

  49. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI (2017)

    Google Scholar 

  50. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  51. Wang, S.H., Phillips, P., Sui, Y., Liu, B., Yang, M., Cheng, H.: Classification of Alzheimer’s disease based on eight-layer convolutional neural network with leaky rectified linear unit and max pooling. J. Med. Syst. 42(5), 85 (2018)

    Article  Google Scholar 

  52. Watson, A.B.: Image compression using the discrete cosine transform. Math. J. 4(1), 81 (1994)

    MathSciNet  Google Scholar 

  53. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Proceedings of the NeurIPS (2016)

    Google Scholar 

  54. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  55. Wu, B., et al.: Shift: a zero flop, zero parameter alternative to spatial convolutions. In: Proceedings of the CVPR (2018)

    Google Scholar 

  56. Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: Sun database: exploring a large collection of scene categories. Int. J. Comput. Vision 119(1), 3–22 (2016)

    Article  MathSciNet  Google Scholar 

  57. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the CVPR. IEEE (2017)

    Google Scholar 

  58. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of the ICLR (2016)

    Google Scholar 

  59. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the BMVC (2016)

    Google Scholar 

  60. Zerlaut, Y., Destexhe, A.: Enhanced responsiveness and low-level awareness in stochastic network states. Neuron 94(5), 1002–1009 (2017)

    Article  Google Scholar 

  61. Zhang, L., Bao, P., Wu, X.: Multiscale lmmse-based image denoising with optimal wavelet selection. IEEE Trans. Circuits Syst. Video Technol. 15(4), 469–481 (2005)

    Article  Google Scholar 

  62. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the CVPR (2018)

    Google Scholar 

  63. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the CVPR (2016)

    Google Scholar 

  64. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018)

    Article  Google Scholar 

  65. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets v2: more deformable, better results. In: Proceedings of the CVPR (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L., Wang, K., Li, S., Feng, X., Zhang, L. (2020). LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12355. Springer, Cham. https://doi.org/10.1007/978-3-030-58607-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58607-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58606-5

  • Online ISBN: 978-3-030-58607-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics