Skip to main content

Advertisement

Log in

Non-smooth Bayesian learning for artificial neural networks

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Artificial neural networks (ANNs) are being widely used in supervised machine learning to analyze signals or images for many applications. Using an annotated learning database, one of the main challenges is to optimize the network weights. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively such as gradient-based method, Newton-type method, meta-heuristic method. For the sake of efficiency, regularization is generally used. When non-smooth regularizers are used especially to promote sparse networks, such as the \(\ell _1\) norm, this optimization becomes challenging due to non-differentiability issues of the target criterion. In this paper, we propose an MCMC-based optimization scheme formulated in a Bayesian framework. The proposed scheme solves the above-mentioned sparse optimization problem using an efficient sampling scheme and Hamiltonian dynamics. The designed optimizer is conducted on four (4) datasets, and the results are verified by a comparative study with two CNNs. Promising results show the usefulness of the proposed method to allow ANNs, even with low complexity levels, reaching high accuracy rates of up to \(94\%\). The proposed method is also faster and more robust concerning overfitting issues. More importantly, the training step of the proposed method is much faster than all competing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset.

  2. https://www.kaggle.com/luisblanche/covidct.

References

  • Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609

    MathSciNet  MATH  Google Scholar 

  • Alder BJ, Wainwright TE (1959) Studies in molecular dynamics I. General method. J Chem Phys 31(2):459–466

    MathSciNet  Google Scholar 

  • Alsarhan A, Alauthman M, Alshdaifat E, Al-Ghuwairi A-R, Al-Dubai A (2021) Machine learning-driven optimization for svm-based intrusion detection system in vehicular ad hoc networks. J Ambient Intell Humaniz Comput Accesses 557:1–10

    Google Scholar 

  • Angelov P, Almeida Soares E (2020) Sars-cov-2 ct-scan dataset: a large dataset of real patients ct scans for sars-cov-2 identification. medRxiv. https://doi.org/10.1101/2020.04.24.20078584

    Google Scholar 

  • Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):1–18

    Article  Google Scholar 

  • Ashwini R, Shital R (2019) Deep neural network regularization for feature selection in learning-to-rank. IEEE Access 7:53988–54006

    Google Scholar 

  • Avriel M (2003) Nonlinear programming: analysis and methods. Courier Corporation, Chelmsford

    MATH  Google Scholar 

  • Berahas AS, Byrd RH, Nocedal J (2019) Derivative-free optimization of noisy functions via quasi-newton methods. SIAM J Optim 29(2):965–993

    MathSciNet  MATH  Google Scholar 

  • Bollapragada R, Byrd RH, Nocedal J (2019) Exact and inexact subsampled newton methods for optimization. IMA J Numer Anal 39(2):545–578

    MathSciNet  MATH  Google Scholar 

  • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev 60(2):223–311

    MathSciNet  MATH  Google Scholar 

  • Bruno G, Antonelli D, Stadnicka D (2021) Evaluating the effect of learning rate, batch size and assignment strategies on the production performance. J Ind Prod Eng 38(2):137–147

    Google Scholar 

  • Brutzkus, A., Globerson, A., Malach, E., and Shalev-Shwartz, S. (2017). Sgd learns over-parameterized networks that provably generalize on linearly separable data. arXiv preprint arXiv:1710.10174

  • Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031

    MathSciNet  MATH  Google Scholar 

  • Chaabene S, Boudaya A, Bouaziz B, Hokelmann A, Ammar A, Chaari L (2021) Convolutional neural network for drowsiness detection using EEG signals. Sensors 21(5):1–19

    Google Scholar 

  • Chaari L, Batatia H, Dobigeon N, Tourneret J (2014) A hierarchical sparsity-smoothness Bayesian model for l0+l1+l2 regularization. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, Italy. IEEE, pp 1901–1905

  • Chaari L, Tourneret J-Y, Chaux C, Batatia H (2016) A Hamiltonian Monte Carlo method for non-smooth energy sampling. IEEE Trans Signal Process 64(21):5585–5594

    MathSciNet  MATH  Google Scholar 

  • Chaari L, Tourneret J-Y, Batatia H (2017) A general non-smooth Hamiltonian Monte Carlo scheme using Bayesian proximity operator calculation. In: European signal processing conference EUSIPCO, Kos, Greece. IEEE, pp 1260–1264

  • Chang H-S, Learned-Miller E, McCallum A (2017) Active bias: training more accurate neural networks by emphasizing high variance samples. arXiv preprint arXiv:1704.07433

  • Chaux C, Combettes P, Pesquet J, Wajs V (2007) A variational formulation for frame-based inverse problems. Inverse Probl 23(4):1495

    MathSciNet  MATH  Google Scholar 

  • Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 2857–2865

  • Chib S, Greenberg E (1995) Understanding the metropolis-hastings algorithm. Am Stat 49(4):327–335

    Google Scholar 

  • Devunooru S, Alsadoon A, Chandana P, Beg A (2021) Deep learning neural networks for medical image segmentation of brain tumours for diagnosis: a recent review and taxonomy. J Ambient Intell Humaniz Comput 12(1):455–483

    Google Scholar 

  • Drewek-Ossowicka A, Pietrołaj M, Rumiński J (2021) A survey of neural networks usage for intrusion detection systems. J Ambient Intell Humaniz Comput 12(1):497–514

    Google Scholar 

  • Fakhfakh M, Bouaziz B, Gargouri F, Chaari L (2020a) Prognet: COVID-19 prognosis using recurrent and convolutional neural networks. Open Med Imaging J 12(1):1–7

    Google Scholar 

  • Fakhfakh M, Chaâri L, Fakhfakh N (2020b) Bayesian curved lane estimation for autonomous driving. J Ambient Intell Humaniz Comput. 11(10):4133–4143

    Google Scholar 

  • Fan Y, Yu J, Mei Y, Zhang Y, Fu Y, Liu D, Huang TS (2020) Neural sparse representation for image restoration. arXiv:2006.04357

  • Gen L, Yuantao G, Jie D (2020) The efficacy of \(l_1\) regularization in two-layer neural networks. https://doi.org/10.48550/arXiv.2010.01048

  • Gomez AN, Zhang I, Kamalakara SR, Madaan D, Swersky K, Gal Y, Hinton GE (2019) Learning sparse networks using targeted dropout. arXiv preprint arXiv:1905.13678

  • Goyal S, Singh R (2021) Detection and classification of lung diseases for pneumonia and COVID-19 using machine and deep learning techniques. J Ambient Intell Humaniz Comput Accesses 2685:1–21

    Google Scholar 

  • Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626

  • Han L, Lin H, Jun L (2017) Remote sensing image classification based on convolutional neural networks with two-fold sparse regularization. In: IEEE international geoscience and remote sensing symposium (IGARSS), Fort Worth, TX, USA, 2153-7003, pp 992–995

  • Hanson K (2001) Markov Chain Monte Carlo posterior sampling with the hamiltonian method. In: Medical imaging 2001: image processing, vol 4322. International Society for Optics and Photonics, pp 456–467

  • Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12

    Google Scholar 

  • Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456

  • Jaini SNB, Lee D, Lee S, Kim M, Kwon Y (2021) Tool monitoring of end milling based on gap sensor and machine learning. J Ambient Intell Humaniz Comput 12(12):10615–10627

    Google Scholar 

  • Jia H, Sun K, Zhang W, Leng X (2021) An enhanced chimp optimization algorithm for continuous optimization domains. Complex Intell Syst 8(1):65–82

    Google Scholar 

  • Khishe M, Mohammadi H (2019) Passive sonar target classification using multi-layer perceptron trained by salp swarm algorithm. Ocean Eng 181:98–108

    Google Scholar 

  • Khishe M, Mosavi M (2019) Improved whale trainer for sonar datasets classification using neural network. Appl Acoust 154:176–192

    Google Scholar 

  • Khishe M, Safari A (2019) Classification of sonar targets using an MLP neural network trained by dragonfly algorithm. Wirel Pers Commun 108(4):2241–2260

    Google Scholar 

  • Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Konečnỳ J, McMahan HB, Ramage D, Richtárik P (2016) Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  • Lee C-H, Xu X, Eun DY (2012) Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM Sigmetrics Perform Eval Rev 40(1):319–330

    Google Scholar 

  • Li T-M, Lehtinen J, Ramamoorthi R, Jakob W, Durand F (2015) Anisotropic gaussian mutations for metropolis light transport through hessian-hamiltonian dynamics. ACM Trans Graph (TOG) 34(6):1–13

    Google Scholar 

  • Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60

    Google Scholar 

  • Loris I, Nolet G, Daubechies I, Dahlen FA (2007) Tomographic inversion using-1-norm regularization of wavelet coefficients. Geophys J Int 170(1):359–370

    Google Scholar 

  • Martens J et al (2010) Deep learning via hessian-free optimization. In: ICML, vol 27. pp 735–742

  • Mhaskar HN, Poggio T (2016) Deep vs. shallow networks: an approximation theory perspective. Anal Appl 14(06):829–848

    MathSciNet  MATH  Google Scholar 

  • Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9:1–12

    Google Scholar 

  • Moreau J-J (1965) Proximité et dualité dans un espace hilbertien. Bull Soc Math France 93:273–299

    MathSciNet  MATH  Google Scholar 

  • Mosavi MR, Khishe M, Naseri MJ, Parvizi GR, Ayat M (2019) Multi-layer perceptron neural network utilizing adaptive best-mass gravitational search algorithm to classify sonar dataset. Arch Acousti 44:1–15

    Google Scholar 

  • Muhammad U, Wang W, Chattha SP, Ali S (2018) Pre-trained vggnet architecture for remote-sensing image scene classification. In: 24th international conference on pattern recognition (ICPR), Beijing, China. IEEE, pp 1622–1627

  • Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234

    MathSciNet  MATH  Google Scholar 

  • Neal RM et al (2011) Mcmc using hamiltonian dynamics. Handb Markov Chain Monte Carlo 2(11):2

    MATH  Google Scholar 

  • Nocedal J, Wright SJ (2006) Sequential quadratic programming. Numer Optim Accesses 234k:529–562

    Google Scholar 

  • Ostad-Ali-Askari K, Shayan M (2021) Subsurface drain spacing in the unsteady conditions by hydrus-3d and artificial neural networks. Arab J Geosci 14(18):1–14

    Google Scholar 

  • Ostad-Ali-Askari K, Shayannejad M, Ghorbanizadeh-Kharazi H (2017) Artificial neural network for modeling nitrate pollution of groundwater in marginal area of zayandeh-rood river, Isfahan, Iran. KSCE J Civ Eng 21(1):134–140

    Google Scholar 

  • Pajarinen J, Thai HL, Akrour R, Peters J, Neumann G (2019) Compatible natural gradient policy search. Mach Learn 108(8):1443–1466

    MathSciNet  MATH  Google Scholar 

  • Połap D (2021) Fuzzy consensus with federated learning method in medical systems. IEEE Access 9:150383–150392

    Google Scholar 

  • Quiroz M, Villani M, Kohn R (2016) Slable mcmc for large data problems using data subsampling and the difference estimator. Riksbank Res Pap Ser 160:1–32

    Google Scholar 

  • Recht B, Roelofs R, Schmidt L, Shankar V (2018) Do cifar-10 classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451

  • Rere L, Fanany MI, Arymurthy AM (2016) Metaheuristic algorithms for convolution neural network. Comput Intell Neurosci 2016:1–14

    Google Scholar 

  • Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407

    MathSciNet  MATH  Google Scholar 

  • Roberts G, Tweedie R (1996) Exponential convergence of langevin distributions and their discrete approximations. Bernoulli 2(4):341–363

    MathSciNet  MATH  Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    MATH  Google Scholar 

  • Sajja TK, Kalluri HK (2021) Image classification using regularized convolutional neural network design with dimensionality reduction modules: Rcnn–drm. J Ambient Intell Humaniz Comput 12(10):9423–9434

    Google Scholar 

  • Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89

    Google Scholar 

  • Schraudolph NN, Yu J, Günter S (2007) A stochastic quasi-newton method for online convex optimization. In: Artificial intelligence and statistics. PMLR, pp 436–443

  • Shakshuki E, Yasar A, Malik H (2020) Applications of machine learning in pervasive systems. J Ambient Intell Humaniz Comput 11:5807–5808

    Google Scholar 

  • Shanno DF (1970) Conditioning of quasi-newton methods for function minimization. Math Comput 24(111):647–656

    MathSciNet  MATH  Google Scholar 

  • Shi Y (2004) Particle swarm optimization. IEEE Connect 2(1):8–13

    MathSciNet  Google Scholar 

  • Sree V, Mapes J, Dua S, Lih OS, Koh JE, Ciaccio EJ, Acharya UR et al (2021) A novel machine learning framework for automated detection of arrhythmias in ecg segments. J Ambient Intell Humaniz Comput 12(11):10145–10162

    Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sun S, Cao Z, Zhu H, Zhao J (2019) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681

    Google Scholar 

  • Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147

  • Tartaglione, E, Lepsøy S, Fiandrotti A, Francini G (2018) Learning sparse neural networks via sensitivity-driven regularization. In: Proceedings of the 32nd international conference on neural information processing systems (NeurIPS). Montreal, Canada, pp 3882–3892

  • Wang Z, Mohamed S, Freitas N (2013) Adaptive hamiltonian and Riemann manifold Monte Carlo. In: International conference on machine learning. PMLR, pp 1462–1470

  • Whitley D, Starkweather T, Bogart C (1990) Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Comput 14(3):347–361

    Google Scholar 

  • Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747

  • Xie C, Zhang F (2021) A new sequence optimization algorithm based on particle swarm for machine learning. J Ambient Intell Humaniz Comput 13(5):2601–2619

    Google Scholar 

  • Xu Z, Zhang H, Wang Y, Chang X, Liang Y (2010) L 1/2 regularization. Sci China Inf Sci 53(6):1159–1169

    MathSciNet  MATH  Google Scholar 

  • Yang X, He X, Zhao J, Zhang Y, Zhang S, Xie P (2020) COVID-ct-dataset: a ct image dataset about COVID-19. arXiv preprint arxiv:2003.13865

  • Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N, Khazaeni Y (2019) Bayesian nonparametric federated learning of neural networks. In: International conference on machine learning. PMLR, pp 7252–7261

  • Zaheer R, Shaziya H (2019) A study of the optimization algorithms in deep learning. In: 2019 third international conference on inventive systems and control (ICISC). IEEE, pp 536–539

  • Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Fakhfakh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fakhfakh, M., Chaari, L., Bouaziz, B. et al. Non-smooth Bayesian learning for artificial neural networks. J Ambient Intell Human Comput 14, 13813–13831 (2023). https://doi.org/10.1007/s12652-022-04073-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-04073-8

Keywords