Abstract
Classifying data samples into their respective categories is a challenging task, especially when the dataset has more features and only a few samples. A robust model is essential for the accurate classification of data samples. The logistic sigmoid model is one of the simplest model for binary classification. Among the various optimization techniques of the sigmoid function, Adam optimization technique iteratively updates network weights based on training data. Traditional Adam optimizer fails to converge model within certain epochs when the initial values for parameters are situated at the gentle region of the error surface. The continuous movement of the convergence curve in the direction of history can overshoot the goal and oscillate back and forth incessantly before converging to the global minima. The traditional Adam optimizer with a higher learning rate collapses after several epochs for the high-dimensional dataset. The proposed Improved Adam (iAdam) technique is a combination of the look-ahead mechanism and adaptive learning rate for each parameter. It improves the momentum of traditional Adam by evaluating the gradient after applying the current velocity. iAdam also acts as the correction factor to the momentum of Adam. Further, it works efficiently for the high-dimensional dataset and converges considerably to the smallest error within the specified epochs even at higher learning rates. The proposed technique is compared with several traditional methods which demonstrates that iAdam is suitable for the classification of high-dimensional data and it also prevents the model from overfitting by effectively handling bias-variance trade-offs.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amini S, Homayouni S, Safari A, Darvishsefat AA (2018) Object-based classification of hyperspectral data using Random Forest algorithm. Geo-Spatial Inf Sci 21:127–138. https://doi.org/10.1080/10095020.2017.1399674
Bhaya A, Kaszkurewicz E (2004) Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Netw 17:65–71. https://doi.org/10.1016/S0893-6080(03)00170-9
Bhosale PG, Cristea S, Ambatipudi S et al (2017) Chromosomal alterations and gene expression changes associated with the progression of leukoplakia to advanced gingivobuccal cancer. Transl Oncol 10:396–409. https://doi.org/10.1016/j.tranon.2017.03.008
Brittain WJ, Brandstetter T, Prucker O, Rühe J (2019) The surface science of microarray generation—a critical inventory. ACS Appl Mater Interfaces. https://doi.org/10.1021/acsami.9b06838
Chang Z, Cao J, Zhang Y (2018) A novel image segmentation approach for wood plate surface defect classification through convex optimization. J For Res 29:1789–1795. https://doi.org/10.1007/s11676-017-0572-7
Chen Y, Chi Y, Fan J, Ma C (2019) Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math Program 176:5–37. https://doi.org/10.1007/s10107-019-01363-6
Cheung YK, Cole R, Devanur N (2019) Tatonnement Beyond Gross Substitutes ? Gradient Descent to the Rescue. Games Econ Behav (In Press)
Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214. https://doi.org/10.1016/j.artmed.2019.01.006
Fernandez-Lozano C, Seoane JA, Gestal M et al (2015) Texture classification using feature selection and kernel-based techniques. Soft Comput 19:2469–2480. https://doi.org/10.1007/s00500-014-1573-5
Giselsson P, Doan MD, Keviczky T et al (2013) Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49:829–833
Goodfellow I, Bengio Y, Courville A (2016) Optimization for training deep models. Deep learning. MIT Press, Cambridge, pp 274–317
Guillot L, Cochelin B, Vergez C et al (2019) A generic and efficient Taylor series based continuation method using a quadratic recast of smooth nonlinear systems. Int J Numer Methods Eng 119:261–280
He W, Zhu X, Cheng D et al (2017) Low-rank unsupervised graph feature selection via feature self-representation. Multimed Tools Appl 76:12149–12164. https://doi.org/10.1007/s11042-016-3937-6
He Y, Ma J, Wang A et al (2018) A support vector machine and a random forest classifier indicates a 15-miRNA set related to osteosarcoma recurrence. Onco Targets Ther 11:253–269
Huang S, Cai N, Pacheco PP (2018) Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics-Proteomics 15:41–51. https://doi.org/10.21873/cgp.20063
Isachenko RV, Strijov VV (2018) Quadratic programming optimization with feature selection for nonlinear models. Lobachevskii J Math 39:1179–1187. https://doi.org/10.1134/S199508021809010X
Kamkar I, Gupta SK, Phung D, Venkatesh S (2015a) Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso. J Biomed Inform 53:277–290. https://doi.org/10.1016/j.jbi.2014.11.013
Kamkar I, Gupta SK, Phung D, Venkatesh S (2015b) Exploiting feature relationships towards stable feature selection. In: Proceedings of the 2015 IEEE international conference on data science and advanced analytics, DSAA 2015, pp 1–10
Kingma DP, Ba JL (2015) ADAM: A method for stochastic optimization. In: 3rd international conference on learning representations, pp 1–15
Kolossoski O, Monteiro RDC (2017) Optimization methods and software an accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex—concave saddle-point problems. Optim Methods Softw 32:1244–1272. https://doi.org/10.1080/10556788.2016.1266355
Lapchak PA, Zhang JH (2018) Data standardization and quality management. Transl Stroke Res 9:4–8. https://doi.org/10.1007/s12975-017-0531-9
Le T, Kim J, Kim H (2017) An effective intrusion detection classifier using long short-term memory with gradient descent optimization. In: International conference on platform technology and service (PlatCon). Busan, pp 1–6
Li Y, Si J, Zhou G et al (2015) FREL: a stable feature selection algorithm. IEEE Trans Neural Networks Learn Syst 26:1388–1402. https://doi.org/10.1109/TNNLS.2014.2341627
Liu C, Wu J, Mirador L et al (2018) Classifying DNA methylation imbalance data in cancer risk prediction using SMOTE and Tomek link methods. In: International conference of pioneering computer scientists, engineers and educators. Springer, Singapore, pp 1–9
López J, Maldonado S (2018) Redefining nearest neighbor classification in high-dimensional settings. Pattern Recognit Lett 110:36–43. https://doi.org/10.1016/j.patrec.2018.03.023
Mahdianpari M, Salehi B, Mohammadimanesh F, Brisco B (2018) Fisher linear discriminant analysis of coherency matrix for wetland classification using PolSAR imagery. Remote Sens Environ 206:300–317. https://doi.org/10.1016/j.rse.2017.11.005
Mandt S, Hoffman MD, Blei DM (2017) Stochastic gradient descent as approximate bayesian inference. J Mach Learn Res 18:1–35
Mirjalili S (2019) Evolutionary multi-layer perceptron. In: Evolutionary algorithms and neural networks. Springer, pp 87–103
Mohammadi M, Noghabi HS, Hodtani GA, Mashhadi HR (2016) Robust and stable gene selection via maximum-minimum correntropy criterion. Genomics 107:83–87. https://doi.org/10.1016/j.ygeno.2015.12.006
Moon M, Nakai K (2016) Stable feature selection based on the ensemble L1-norm support vector machine for biomarker discovery. BMC Genomics. https://doi.org/10.1186/s12864-016-3320-z
Mukkamala MC, Hein M (2017) Variants of RMSProp and Adagrad with Logarithmic Regret Bounds. In: 34th International Conference on Machine Learning. Sydney, Australia, pp 2545–2553
Ohno S, Shiraki T, Tariq MR, Nagahara M (2017) Mean squared error analysis of quantizers with error feedback. IEEE Trans Signal Process 65:5970–5981
Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42:2336–2342. https://doi.org/10.1016/j.eswa.2014.10.044
Pavlov N, Golev A, Iliev A, Rahnev A (2018) On the Kumaraswamy—Dagum—Log—Logistic sigmoid functions with applications to population dynamics. Biomath Commun 2018:5
Perthame É, Friguet C, Causeur D (2016) Stability of feature selection in classification issues for high-dimensional correlated data. Stat Comput 26:783–796. https://doi.org/10.1007/s11222-015-9569-2
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12:145–151
Qin Y, Wang X, Zou J (2018) The optimized deep belief networks with improved logistic Sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines. IEEE Trans Ind Electron 66:3814–3824. https://doi.org/10.1109/TIE.2018.2856205
Riedmiller M (1994) Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms. Comput Stand Interfaces 16:265–278
Rondina JM, Hahn T, De Oliveira L et al (2014) SCoRS—a method based on stability for feature selection and mapping in neuroimaging. IEEE Trans Med Imaging 33:85–98. https://doi.org/10.1109/TMI.2014.2307811
Rudy SH, Brunton SL, Proctor JL, Kutz JN (2017) Data-driven discovery of partial differential equations. Sci Adv 3:1–7
Sangaiah AK, Tirkolaee EB, Goli A, Dehnavi-Arani S (2019) Robust optimization and mixed-integer linear programming model for LNG supply chain planning problem. Soft Comput 6:1–21. https://doi.org/10.1007/s00500-019-04010-6
Sharma P, Gupta A, Aggarwal A et al (2018) The health of things for classification of protein structure using improved grey wolf optimization. J Supercomput. https://doi.org/10.1007/s11227-018-2639-4
Smith LN, Topin N (2019) Super-convergence : very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi-domain operations applications. International Society for Optics and Photonics, pp 1–18
Takase T, Oyama S, Kurihara M (2018) Effective neural network training with adaptive learning rate based on training loss. Neural Networks 101:68–78
Tang W, Fu K, Sun H et al (2018) CircRNA microarray profiling identifies a novel circulating biomarker for detection of gastric cancer. Mol Cancer 17:137
Tirkolaee EB, Mahdavi I, Esfahani MMS (2018) A robust periodic capacitated arc routing problem for urban waste collection considering drivers and crew’s working time. Waste Manag 76:138–146. https://doi.org/10.1016/j.wasman.2018.03.015
Tirkolaee EB, Goli A, Pahlevan M, Kordestanizadeh RM (2019a) A robust bi-objective multi-trip periodic capacitated arc routing problem for urban waste collection using a multi-objective invasive weed optimization. Waste Manag Res 37:1089–1101. https://doi.org/10.1177/0734242X19865340
Tirkolaee EB, Mahdavi I, Esfahani MMS, Weber G-W (2019b) A hybrid augmented ant colony optimization for the multi-trip capacitated arc routing problem under fuzzy demands for urban solid waste management. Waste Manag Res. https://doi.org/10.1177/0734242x19865782
Ward R, Xiaoxia W, Leon B (2018) AdaGrad stepsizes: sharp convergence over nonconvex landscapes, from any initialization. arXiv Prepr arXiv180601811
Wong GY, Leung FHF, Ling S (2018) A hybrid evolutionary preprocessing method for imbalanced datasets. Inf Sci (Ny) 454–455:161–177. https://doi.org/10.1016/j.ins.2018.04.068
Xin B, Huf L, Wang Y, Gao W (2015) Stable feature selection from brain sMRI. In: Proceedings of the national conference on artificial intelligence, pp 1910–1916
Yan Y, Liu R, Ding Z et al (2019) A parameter-free cleaning method for SMOTE in imbalanced classification. IEEE Access 7:23537–23548. https://doi.org/10.1109/ACCESS.2019.2899467
Yu Y, Liu F (2019) Effective neural network training with a new weighting mechanism-based optimization algorithm. IEEE Access 7:72403–72410. https://doi.org/10.1109/ACCESS.2019.2919987
Acknowledgements
This study was funded by the Department of Science and Technology, India under the Interdisciplinary Cyber Physical Systems (ICPS) scheme (Grant no. T-54).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khaire, U.M., Dhanalakshmi, R. High-dimensional microarray dataset classification using an improved adam optimizer (iAdam). J Ambient Intell Human Comput 11, 5187–5204 (2020). https://doi.org/10.1007/s12652-020-01832-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-01832-3