Abstract
The parallelization of optimization algorithms is of paramount importance in large-scale machine learning. In this paper, we explore the implementation of Adaptive learning rate Stochastic Gradient Descent (A-SGD) in a synchronized and parallelized manner. Additionally, we incorporate a Variance Reduction (VR) strategy to enhance the rate of convergence. Our approach addresses the complexity associated with high-dimensional datasets, particularly within the context of Logistic Regression (LR) and Support Vector Machine (SVM). Initially, we utilize the Histogram of Oriented Gradients (HOG) to extract high-dimensional sparse features from a dataset designed for Blindness Detection. Subsequently, we employ LR and SVM as our classifiers of choice. Finally, we apply the Synchronous A-SGD (SA-SGD) and Synchronous Adaptive Stochastic Variance Reduction (SA-SVRG) to the solutions of these classifiers. Our experimental results indicate that the performance of SA-SGD and SA-SVRG is notably superior when executed on a cluster as opposed to a single node.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The APTOS 2019 Blindness Detection dataset used in the experiment is sourced from the official website of Kaggle: https://www.kaggle.com/c/aptos2019-blindness-detection.
Code Availability
Enquiries about Code availability should be directed to the authors.
References
Karim MR, Shajalal M, Graß A, Döhmen T, Chala SA, Beecks C, Decker S (2022) Interpreting Black-box Machine Learning Models for High Dimensional Datasets
Fournier Q, Aloise D (2021) Empirical comparison between autoencoders and traditional dimensionality reduction methods
Bottou L (2018) Optimization methods for large-scale machine learning. SIAM Review
Konen J, Jie L, Richtárik P, Taká M (2014) ms2gd: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE Journal of Selected Topics in Signal Processing, 242–255
Roux NL, Schmidt M, Bach F (2013) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inf Process Syst 25:2663–2671
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. News Physiol Sci 26:315–323
Mustafin A, Olshevsky A, Paschalidis IC (2022) Closing the gap between SVRG and TD-SVRG with Gradient Splitting
Pan H, Zheng L (2022) N-svrg:stochastic variance reduction gradient with noise reduction ability for small batch samples. CMES-Comput Model Eng Sci 004:000
Zinkevich M, Weimer M, Smola AJ, Li L (2011) Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems A Meeting Held December
Xing EP, Ho Q, Dai W, Kim J.K, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y (2015) Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1335–1344
Niu F, Recht B, Re C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. Adv Neural Inf Process Syst 24:693–701
Sra S, Yu AW, Li M, Smola AJ (2016) Adadelay: Delay adaptive distributed stochastic optimization. In: International Conference on Artificial Intelligence and Statistics
Dca B, Sla C, Yz A (2020) Wp-sgd: weighted parallel sgd for distributed unbalanced-workload training system. J Parallel Distrib Comput 145:202–216
Shang F, Huang H, Fan J, Liu Y, Liu J (2021) Asynchronous parallel, sparse approximated svrg for high-dimensional machine learning. IEEE Trans Knowl Data Eng 35(12):12081–12094
Upadhyaya SR (2013) Parallel approaches to machine learning-a comprehensive survey. J Parallel Distrib Comput 73(3):284–292
Lian H, Fan Z (2018) Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. J Mach Learn Res 18(182):1–26
Melki G, Kecman V (2016) Speeding up online training of l1 support vector machines. In: IEEE Southeastcon 2016
Jing L (2022) Prediction of rural investment and construction funds based on logistic regression and support vector machine combination model. Montreal, QC, Canada, pp 185–190
Mu L (2014) Efficient mini-batch training for stochastic optimization. ACM
Nesterov Yu (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161
Khirirat S, Feyzmahdavian HR, Johansson M (2017) Mini-batch gradient descent: Faster convergence under data sparsity. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC)
Dekel O, Ran GB, Shamir O, Xiao L (2010) Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research
Zaharia M, Chowdhury M, Franklin M.J, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. USENIX Association
Cheng D, Zhou X, Lama P, Wu J, Jiang C (2017) Cross-platform resource scheduling for spark and mapreduce on yarn. IEEE Trans. Comput. 66(8):1341–1353
Cheng G, Ying S, Wang B, Li Y (2021) Efficient performance prediction for apache spark. J Parallel Distrib Comput 149:40–51
Zhang H, Liu Z, Huang H, Wang L Ftsgd: An adaptive stochastic gradient descent algorithm for spark mllib. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
Yang Z, Bhimani J, Yao YI, Lin C, Wang J (2018) Autoadmin: Automatic and dynamic resource reservation admission control in hadoop yarn clusters. Scalable Computing
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization, pp. 257–269
Chilimbi T, Suzue Y, Apacible J, Kalyanaraman K (2014) Project adam: Building an efficient and scalable deep learning training system. USENIX Association
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Qin C, Li B, Han B (2023) Fast brain tumor detection using adaptive stochastic gradient descent on shared-memory parallel environment. Eng Appl Artif Intell 120:105816
Rettkowski Jens, Boutros Andrew, Göhringer Diana (2017) Hw/sw co-design of the hog algorithm on a xilinx zynq soc. J Parallel Distrib Comput 109:50–62
Venkataramana L, Jacob SG, Ramadoss R (2020) A parallel multilevel feature selection algorithm for improved cancer classification. J Parallel Distrib Comput 138:78–98
Dang Q, Yang S, Liu Q, Ruan J (2024) Adaptive and communication-efficient zeroth-order optimization for distributed internet of things. IEEE Internet Things J 11(22):37200–37213. https://doi.org/10.1109/JIOT.2024.3441691
Funding
This work submitted by the author has received financial support from the project of Ningxia Higher Education Institutions (NYG2024093), Innovation Project for Postgraduate Students of North Minzu University (YCX24094).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Chuandong Qin, Yiqing Zhang. The first draft of the manuscript was written by Yiqing Zhang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. All authors agreed with the content and that all gave explicit consent to submit. Chuandong Qin: Made substantial contributions to the conception or design of the work, and approved the version to be published. Yiqing Zhang: The acquisition, analysis, interpretation of data and the creation of new software used in the work, drafted the work or revised it critically for important intellectual content.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Large-scale Machine Learning with Synchronous Parallel Adaptive Stochastic Variance Reduction Gradient Descent for High-dimensional Blindness Detection on Spark”.
Ethical approval
This is an observational study. No ethical approval is required.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent to publish
Additional informed consent was obtained from all individual participants for whom identifying information is included in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, C., Zhang, Y. & Cao, Y. Large-scale machine learning with synchronous parallel adaptive stochastic variance reduction gradient descent for high-dimensional blindness detection on spark. J Supercomput 81, 590 (2025). https://doi.org/10.1007/s11227-025-07046-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07046-8
Keywords
Profiles
- Yu Cao View author profile