ABSTRACT
The traditional machine learning model can be formulated as an empirical risk minimization problem, which is typically optimized via stochastic gradient descent (SGD). With the emergence of big data, distributed optimization, e.g., distributed SGD, has been attracting increasing attention to facilitate machine learning models for big data analytics. However, existing distributed optimization mainly focuses on the standard empirical risk minimization problem, failing to deal with the emerging machine learning models that are beyond that category. Thus, of particular interest of this tutorial includes the stochastic minimax optimization, stochastic bilevel optimization, and stochastic compositional optimization, which covers a wide range of emerging machine learning models, e.g., model-agnostic meta-learning models, adversarially robust machine learning models, imbalanced data classification models, etc. Since these models have been widely used in big data analytics, it is necessary to provide a comprehensive introduction about the new distributed optimization algorithms designed for these models. Therefore, the goal of this tutorial is to present the state-of-the-art and recent advances in distributed minimax optimization, distributed bilevel optimization, and distributed compositional optimization. In particular, we will introduce the typical applications in each category and discuss the corresponding distributed optimization algorithms in both centralized and decentralized settings. Through this tutorial, the researchers will be exposed to the fundamental algorithmic design and basic convergence theories, and the practitioners will be able to benefit from this tutorial to apply these algorithms to real-world data mining applications.
- Xuxing Chen, Minhui Huang, and Shiqian Ma. 2022. Decentralized Bilevel Optimization. arXiv preprint arXiv:2206.05670 (2022).Google Scholar
- Yuyang Deng and Mehrdad Mahdavi. 2021. Local stochastic gradient descent ascent: Convergence analysis and communication efficiency. In International Conference on Artificial Intelligence and Statistics. PMLR, 1387--1395.Google Scholar
- Matthias Feurer and Frank Hutter. 2019. Hyperparameter optimization. In Automated machine learning. Springer, Cham, 3--33.Google Scholar
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.Google Scholar
- Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165--1173.Google Scholar
- Hongchang Gao. 2022a. Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems. arXiv preprint arXiv:2212.02724 (2022).Google Scholar
- Hongchang Gao. 2022b. On the Convergence of Momentum-Based Algorithms for Federated Stochastic Bilevel Optimization Problems. arXiv preprint arXiv:2204.13299 (2022).Google Scholar
- Hongchang Gao, Bin Gu, and My T Thai. 2022a. Stochastic Bilevel Distributed Optimization over a Network. arXiv preprint arXiv:2206.15025 (2022).Google Scholar
- Hongchang Gao and Heng Huang. 2021. Fast Training Method for Stochastic Compositional Optimization Problems. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Hongchang Gao, Junyi Li, and Heng Huang. 2022b. On the convergence of local stochastic compositional gradient descent with momentum. In International Conference on Machine Learning. PMLR, 7017--7035.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems, Vol. 27 (2014).Google ScholarDigital Library
- Zhishuai Guo, Mingrui Liu, Zhuoning Yuan, Li Shen, Wei Liu, and Tianbao Yang. 2020. Communication-efficient distributed stochastic auc maximization with deep neural networks. In International Conference on Machine Learning. PMLR, 3864--3874.Google Scholar
- Feihu Huang, Junyi Li, and Heng Huang. 2021. Compositional Federated Learning: Applications in Distributionally Robust Averaging and Meta Learning. arXiv preprint arXiv:2106.11264 (2021).Google Scholar
- Junyi Li, Feihu Huang, and Heng Huang. 2023. Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems. arXiv preprint arXiv:2302.06701 (2023).Google Scholar
- Tianyi Lin, Chi Jin, and Michael Jordan. 2020. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning. PMLR, 6083--6093.Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
- Mingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, and Payel Das. 2019. A Decentralized Parallel Algorithm for Training Generative Adversarial Nets. arXiv preprint arXiv:1910.12999 (2019).Google Scholar
- Pranay Sharma, Rohan Panda, Gauri Joshi, and Pramod Varshney. 2022. Federated minimax optimization: Improved convergence analyses and algorithms. In International Conference on Machine Learning. PMLR, 19683--19730.Google Scholar
- Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, and Samet Oymak. 2022. FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization. arXiv preprint arXiv:2205.02215 (2022).Google Scholar
- Bokun Wang, Zhuoning Yuan, Yiming Ying, and Tianbao Yang. 2021. Memory-based Optimization Methods for Model-Agnostic Meta-Learning. arXiv preprint arXiv:2106.04911 (2021).Google Scholar
- Wenhan Xian, Feihu Huang, Yanfu Zhang, and Heng Huang. 2021. A faster decentralized algorithm for nonconvex minimax problems. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Shuoguang Yang, Xuezhou Zhang, and Mengdi Wang. 2022. Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks. arXiv preprint arXiv:2206.10870 (2022).Google Scholar
- Tianbao Yang. 2022. Algorithmic foundation of deep x-risk optimization. arXiv preprint arXiv:2206.00439 (2022).Google Scholar
- Yiming Ying, Longyin Wen, and Siwei Lyu. 2016. Stochastic online auc maximization. Advances in neural information processing systems, Vol. 29 (2016).Google Scholar
- Zhuoning Yuan, Zhishuai Guo, Yi Xu, Yiming Ying, and Tianbao Yang. 2021. Federated deep AUC maximization for hetergeneous data with a constant communication complexity. In International Conference on Machine Learning. PMLR, 12219--12229.Google Scholar
- Zhuoning Yuan, Yuexin Wu, Zi-Hao Qiu, Xianzhi Du, Lijun Zhang, Denny Zhou, and Tianbao Yang. 2022. Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. In International Conference on Machine Learning. PMLR, 25760--25782.Google Scholar
- Xin Zhang, Zhuqing Liu, Jia Liu, Zhengyuan Zhu, and Songtao Lu. 2021. Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 18825--18838.Google Scholar
- Xinwen Zhang, Yihan Zhang, Tianbao Yang, Richard Souvenir, and Hongchang Gao. 2023. Federated Compositional Deep AUC Maximization. arXiv preprint arXiv:2304.10101 (2023).Google Scholar
- Shengchao Zhao and Yongchao Liu. 2022. Distributed Stochastic Compositional Optimization Problems over Directed Networks. arXiv preprint arXiv:2203.11074 (2022).Google Scholar
Index Terms
- Distributed Optimization for Big Data Analytics: Beyond Minimization
Comments