Non-smooth Bayesian learning for artificial neural networks

Fakhfakh, Mohamed; Chaari, Lotfi; Bouaziz, Bassem; Gargouri, Faiez

doi:10.1007/s12652-022-04073-8

Non-smooth Bayesian learning for artificial neural networks

Original Research
Published: 25 June 2022

Volume 14, pages 13813–13831, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Mohamed Fakhfakh^1,2,
Lotfi Chaari²,
Bassem Bouaziz¹ &
…
Faiez Gargouri¹

1477 Accesses
Explore all metrics

Abstract

Artificial neural networks (ANNs) are being widely used in supervised machine learning to analyze signals or images for many applications. Using an annotated learning database, one of the main challenges is to optimize the network weights. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively such as gradient-based method, Newton-type method, meta-heuristic method. For the sake of efficiency, regularization is generally used. When non-smooth regularizers are used especially to promote sparse networks, such as the $\ell _1$ norm, this optimization becomes challenging due to non-differentiability issues of the target criterion. In this paper, we propose an MCMC-based optimization scheme formulated in a Bayesian framework. The proposed scheme solves the above-mentioned sparse optimization problem using an efficient sampling scheme and Hamiltonian dynamics. The designed optimizer is conducted on four (4) datasets, and the results are verified by a comparative study with two CNNs. Promising results show the usefulness of the proposed method to allow ANNs, even with low complexity levels, reaching high accuracy rates of up to $94\%$. The proposed method is also faster and more robust concerning overfitting issues. More importantly, the training step of the proposed method is much faster than all competing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Bayesian Learning of Sparse Deep Artificial Neural Networks

Bayesian Optimization for Sparse Artificial Neural Networks: Application to Change Detection in Remote Sensing

SRS-DNN: a deep neural network with strengthening response sparsity

Article 26 June 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
MathSciNet MATH Google Scholar
Alder BJ, Wainwright TE (1959) Studies in molecular dynamics I. General method. J Chem Phys 31(2):459–466
MathSciNet Google Scholar
Alsarhan A, Alauthman M, Alshdaifat E, Al-Ghuwairi A-R, Al-Dubai A (2021) Machine learning-driven optimization for svm-based intrusion detection system in vehicular ad hoc networks. J Ambient Intell Humaniz Comput Accesses 557:1–10
Google Scholar
Angelov P, Almeida Soares E (2020) Sars-cov-2 ct-scan dataset: a large dataset of real patients ct scans for sars-cov-2 identification. medRxiv. https://doi.org/10.1101/2020.04.24.20078584
Google Scholar
Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):1–18
Article Google Scholar
Ashwini R, Shital R (2019) Deep neural network regularization for feature selection in learning-to-rank. IEEE Access 7:53988–54006
Google Scholar
Avriel M (2003) Nonlinear programming: analysis and methods. Courier Corporation, Chelmsford
MATH Google Scholar
Berahas AS, Byrd RH, Nocedal J (2019) Derivative-free optimization of noisy functions via quasi-newton methods. SIAM J Optim 29(2):965–993
MathSciNet MATH Google Scholar
Bollapragada R, Byrd RH, Nocedal J (2019) Exact and inexact subsampled newton methods for optimization. IMA J Numer Anal 39(2):545–578
MathSciNet MATH Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev 60(2):223–311
MathSciNet MATH Google Scholar
Bruno G, Antonelli D, Stadnicka D (2021) Evaluating the effect of learning rate, batch size and assignment strategies on the production performance. J Ind Prod Eng 38(2):137–147
Google Scholar
Brutzkus, A., Globerson, A., Malach, E., and Shalev-Shwartz, S. (2017). Sgd learns over-parameterized networks that provably generalize on linearly separable data. arXiv preprint arXiv:1710.10174
Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031
MathSciNet MATH Google Scholar
Chaabene S, Boudaya A, Bouaziz B, Hokelmann A, Ammar A, Chaari L (2021) Convolutional neural network for drowsiness detection using EEG signals. Sensors 21(5):1–19
Google Scholar
Chaari L, Batatia H, Dobigeon N, Tourneret J (2014) A hierarchical sparsity-smoothness Bayesian model for l0+l1+l2 regularization. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, Italy. IEEE, pp 1901–1905
Chaari L, Tourneret J-Y, Chaux C, Batatia H (2016) A Hamiltonian Monte Carlo method for non-smooth energy sampling. IEEE Trans Signal Process 64(21):5585–5594
MathSciNet MATH Google Scholar
Chaari L, Tourneret J-Y, Batatia H (2017) A general non-smooth Hamiltonian Monte Carlo scheme using Bayesian proximity operator calculation. In: European signal processing conference EUSIPCO, Kos, Greece. IEEE, pp 1260–1264
Chang H-S, Learned-Miller E, McCallum A (2017) Active bias: training more accurate neural networks by emphasizing high variance samples. arXiv preprint arXiv:1704.07433
Chaux C, Combettes P, Pesquet J, Wajs V (2007) A variational formulation for frame-based inverse problems. Inverse Probl 23(4):1495
MathSciNet MATH Google Scholar
Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 2857–2865
Chib S, Greenberg E (1995) Understanding the metropolis-hastings algorithm. Am Stat 49(4):327–335
Google Scholar
Devunooru S, Alsadoon A, Chandana P, Beg A (2021) Deep learning neural networks for medical image segmentation of brain tumours for diagnosis: a recent review and taxonomy. J Ambient Intell Humaniz Comput 12(1):455–483
Google Scholar
Drewek-Ossowicka A, Pietrołaj M, Rumiński J (2021) A survey of neural networks usage for intrusion detection systems. J Ambient Intell Humaniz Comput 12(1):497–514
Google Scholar
Fakhfakh M, Bouaziz B, Gargouri F, Chaari L (2020a) Prognet: COVID-19 prognosis using recurrent and convolutional neural networks. Open Med Imaging J 12(1):1–7
Google Scholar
Fakhfakh M, Chaâri L, Fakhfakh N (2020b) Bayesian curved lane estimation for autonomous driving. J Ambient Intell Humaniz Comput. 11(10):4133–4143
Google Scholar
Fan Y, Yu J, Mei Y, Zhang Y, Fu Y, Liu D, Huang TS (2020) Neural sparse representation for image restoration. arXiv:2006.04357
Gen L, Yuantao G, Jie D (2020) The efficacy of $l_1$ regularization in two-layer neural networks. https://doi.org/10.48550/arXiv.2010.01048
Gomez AN, Zhang I, Kamalakara SR, Madaan D, Swersky K, Gal Y, Hinton GE (2019) Learning sparse networks using targeted dropout. arXiv preprint arXiv:1905.13678
Goyal S, Singh R (2021) Detection and classification of lung diseases for pneumonia and COVID-19 using machine and deep learning techniques. J Ambient Intell Humaniz Comput Accesses 2685:1–21
Google Scholar
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626
Han L, Lin H, Jun L (2017) Remote sensing image classification based on convolutional neural networks with two-fold sparse regularization. In: IEEE international geoscience and remote sensing symposium (IGARSS), Fort Worth, TX, USA, 2153-7003, pp 992–995
Hanson K (2001) Markov Chain Monte Carlo posterior sampling with the hamiltonian method. In: Medical imaging 2001: image processing, vol 4322. International Society for Optics and Photonics, pp 456–467
Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Jaini SNB, Lee D, Lee S, Kim M, Kwon Y (2021) Tool monitoring of end milling based on gap sensor and machine learning. J Ambient Intell Humaniz Comput 12(12):10615–10627
Google Scholar
Jia H, Sun K, Zhang W, Leng X (2021) An enhanced chimp optimization algorithm for continuous optimization domains. Complex Intell Syst 8(1):65–82
Google Scholar
Khishe M, Mohammadi H (2019) Passive sonar target classification using multi-layer perceptron trained by salp swarm algorithm. Ocean Eng 181:98–108
Google Scholar
Khishe M, Mosavi M (2019) Improved whale trainer for sonar datasets classification using neural network. Appl Acoust 154:176–192
Google Scholar
Khishe M, Safari A (2019) Classification of sonar targets using an MLP neural network trained by dragonfly algorithm. Wirel Pers Commun 108(4):2241–2260
Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konečnỳ J, McMahan HB, Ramage D, Richtárik P (2016) Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Google Scholar
Lee C-H, Xu X, Eun DY (2012) Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM Sigmetrics Perform Eval Rev 40(1):319–330
Google Scholar
Li T-M, Lehtinen J, Ramamoorthi R, Jakob W, Durand F (2015) Anisotropic gaussian mutations for metropolis light transport through hessian-hamiltonian dynamics. ACM Trans Graph (TOG) 34(6):1–13
Google Scholar
Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60
Google Scholar
Loris I, Nolet G, Daubechies I, Dahlen FA (2007) Tomographic inversion using-1-norm regularization of wavelet coefficients. Geophys J Int 170(1):359–370
Google Scholar
Martens J et al (2010) Deep learning via hessian-free optimization. In: ICML, vol 27. pp 735–742
Mhaskar HN, Poggio T (2016) Deep vs. shallow networks: an approximation theory perspective. Anal Appl 14(06):829–848
MathSciNet MATH Google Scholar
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9:1–12
Google Scholar
Moreau J-J (1965) Proximité et dualité dans un espace hilbertien. Bull Soc Math France 93:273–299
MathSciNet MATH Google Scholar
Mosavi MR, Khishe M, Naseri MJ, Parvizi GR, Ayat M (2019) Multi-layer perceptron neural network utilizing adaptive best-mass gravitational search algorithm to classify sonar dataset. Arch Acousti 44:1–15
Google Scholar
Muhammad U, Wang W, Chattha SP, Ali S (2018) Pre-trained vggnet architecture for remote-sensing image scene classification. In: 24th international conference on pattern recognition (ICPR), Beijing, China. IEEE, pp 1622–1627
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
MathSciNet MATH Google Scholar
Neal RM et al (2011) Mcmc using hamiltonian dynamics. Handb Markov Chain Monte Carlo 2(11):2
MATH Google Scholar
Nocedal J, Wright SJ (2006) Sequential quadratic programming. Numer Optim Accesses 234k:529–562
Google Scholar
Ostad-Ali-Askari K, Shayan M (2021) Subsurface drain spacing in the unsteady conditions by hydrus-3d and artificial neural networks. Arab J Geosci 14(18):1–14
Google Scholar
Ostad-Ali-Askari K, Shayannejad M, Ghorbanizadeh-Kharazi H (2017) Artificial neural network for modeling nitrate pollution of groundwater in marginal area of zayandeh-rood river, Isfahan, Iran. KSCE J Civ Eng 21(1):134–140
Google Scholar
Pajarinen J, Thai HL, Akrour R, Peters J, Neumann G (2019) Compatible natural gradient policy search. Mach Learn 108(8):1443–1466
MathSciNet MATH Google Scholar
Połap D (2021) Fuzzy consensus with federated learning method in medical systems. IEEE Access 9:150383–150392
Google Scholar
Quiroz M, Villani M, Kohn R (2016) Slable mcmc for large data problems using data subsampling and the difference estimator. Riksbank Res Pap Ser 160:1–32
Google Scholar
Recht B, Roelofs R, Schmidt L, Shankar V (2018) Do cifar-10 classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451
Rere L, Fanany MI, Arymurthy AM (2016) Metaheuristic algorithms for convolution neural network. Comput Intell Neurosci 2016:1–14
Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407
MathSciNet MATH Google Scholar
Roberts G, Tweedie R (1996) Exponential convergence of langevin distributions and their discrete approximations. Bernoulli 2(4):341–363
MathSciNet MATH Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
MATH Google Scholar
Sajja TK, Kalluri HK (2021) Image classification using regularized convolutional neural network design with dimensionality reduction modules: Rcnn–drm. J Ambient Intell Humaniz Comput 12(10):9423–9434
Google Scholar
Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89
Google Scholar
Schraudolph NN, Yu J, Günter S (2007) A stochastic quasi-newton method for online convex optimization. In: Artificial intelligence and statistics. PMLR, pp 436–443
Shakshuki E, Yasar A, Malik H (2020) Applications of machine learning in pervasive systems. J Ambient Intell Humaniz Comput 11:5807–5808
Google Scholar
Shanno DF (1970) Conditioning of quasi-newton methods for function minimization. Math Comput 24(111):647–656
MathSciNet MATH Google Scholar
Shi Y (2004) Particle swarm optimization. IEEE Connect 2(1):8–13
MathSciNet Google Scholar
Sree V, Mapes J, Dua S, Lih OS, Koh JE, Ciaccio EJ, Acharya UR et al (2021) A novel machine learning framework for automated detection of arrhythmias in ecg segments. J Ambient Intell Humaniz Comput 12(11):10145–10162
Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Sun S, Cao Z, Zhu H, Zhao J (2019) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681
Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147
Tartaglione, E, Lepsøy S, Fiandrotti A, Francini G (2018) Learning sparse neural networks via sensitivity-driven regularization. In: Proceedings of the 32nd international conference on neural information processing systems (NeurIPS). Montreal, Canada, pp 3882–3892
Wang Z, Mohamed S, Freitas N (2013) Adaptive hamiltonian and Riemann manifold Monte Carlo. In: International conference on machine learning. PMLR, pp 1462–1470
Whitley D, Starkweather T, Bogart C (1990) Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Comput 14(3):347–361
Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Xie C, Zhang F (2021) A new sequence optimization algorithm based on particle swarm for machine learning. J Ambient Intell Humaniz Comput 13(5):2601–2619
Google Scholar
Xu Z, Zhang H, Wang Y, Chang X, Liang Y (2010) L 1/2 regularization. Sci China Inf Sci 53(6):1159–1169
MathSciNet MATH Google Scholar
Yang X, He X, Zhao J, Zhang Y, Zhang S, Xie P (2020) COVID-ct-dataset: a ct image dataset about COVID-19. arXiv preprint arxiv:2003.13865
Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N, Khazaeni Y (2019) Bayesian nonparametric federated learning of neural networks. In: International conference on machine learning. PMLR, pp 7252–7261
Zaheer R, Shaziya H (2019) A study of the optimization algorithms in deep learning. In: 2019 third international conference on inventive systems and control (ICISC). IEEE, pp 536–539
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

Download references

Author information

Authors and Affiliations

MIRACL laboratory, University of Sfax, Sfax, Tunisia
Mohamed Fakhfakh, Bassem Bouaziz & Faiez Gargouri
University of Toulouse, INP, IRIT, Toulouse, France
Mohamed Fakhfakh & Lotfi Chaari

Authors

Mohamed Fakhfakh
View author publications
You can also search for this author inPubMed Google Scholar
Lotfi Chaari
View author publications
You can also search for this author inPubMed Google Scholar
Bassem Bouaziz
View author publications
You can also search for this author inPubMed Google Scholar
Faiez Gargouri
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohamed Fakhfakh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fakhfakh, M., Chaari, L., Bouaziz, B. et al. Non-smooth Bayesian learning for artificial neural networks. J Ambient Intell Human Comput 14, 13813–13831 (2023). https://doi.org/10.1007/s12652-022-04073-8

Download citation

Received: 11 November 2021
Accepted: 30 May 2022
Published: 25 June 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12652-022-04073-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-smooth Bayesian learning for artificial neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Bayesian Learning of Sparse Deep Artificial Neural Networks

Bayesian Optimization for Sparse Artificial Neural Networks: Application to Change Detection in Remote Sensing

SRS-DNN: a deep neural network with strengthening response sparsity

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now