Abstract
In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a \(O((\Vert f\Vert ^2_{\mathcal {H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})\) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a \(O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum ^K_{i=1}L_T(f^*_i))^{\frac{2}{3}})\) expected bound where \(L_T(f^*_i)\) is the cumulative losses of optimal hypothesis in \(\mathbb {H}_{i} =\{f\in \mathcal {H}_i:\Vert f\Vert _{\mathcal {H}_i}\le U\}\). The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a \(O(U\sqrt{KT}\ln ^{\frac{2}{3}}{T})\) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous \(O(\sqrt{T\ln {K}} +\Vert f\Vert ^2_{\mathcal {H}_i}\max \{\sqrt{T},\frac{T}{\sqrt{\mathcal {R}}}\})\) expected bound where \(\mathcal {R}\) is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.
This work was supported in part by the National Natural Science Foundation of China under grants No. 62076181.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(\textrm{poly}(\Vert f\Vert _{\mathcal {H}_i})=\Vert f\Vert ^2_{\mathcal {H}_i}+1\). The original paper shows a \(O((\Vert f\Vert ^2_{\mathcal {H}_i}+1)\sqrt{KT})\) expected regret bound. We will clarify the difference in Sect. 2.
- 2.
- 3.
- 4.
The codes are available at https://github.com/JunfLi-TJU/OKS-Bandit.
References
Agarwal, A., Luo, H., Neyshabur, B., Schapire, R.E.: Corralling a band of bandit algorithms. In: Proceedings of the 30th Conference on Learning Theory, pp. 12ā38 (2017)
Audibert, J., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: Proceedings of the 22nd Annual Conference on Learning Theory, pp. 217ā226 (2009)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1), 48ā77 (2002)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Foster, D.J., Kale, S., Mohri, M., Sridharan, K.: Parameter-free online learning via model selection. Adv. Neural. Inf. Process. Syst. 30, 6022ā6032 (2017)
Ghari, P.M., Shen, Y.: Online multi-kernel learning with graph-structured feedback. In: Proceedings of the 37th International Conference on Machine Learning, pp. 3474ā3483 (2020)
Hoi, S.C.H., Jin, R., Zhao, P., Yang, T.: Online multiple kernel classification. Mach. Learn. 90(2), 289ā316 (2013)
Le, Q.V., Sarlós, T., Smola, A.J.: Fastfood-computing hilbert space expansions in loglinear time. In: Proceedings of the 30th International Conference on Machine Learning, pp. 244ā252 (2013)
Li, J., Liao, S.: Worst-case regret analysis of computationally budgeted online kernel selection. Mach. Learn. 111(3), 937ā976 (2022)
Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random Fourier features. In: Proceedings of the 36th International Conference on Machine Learning, pp. 3905ā3914 (2019)
Liao, S., Li, J.: High-probability kernel alignment regret bounds for online kernel selection. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 67ā83 (2021)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. Adv. Neural. Inf. Process. Syst. 20, 1177ā1184 (2007)
Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. Adv. Neural. Inf. Process. Syst. 21, 1313ā1320 (2008)
Sahoo, D., Hoi, S.C.H., Li, B.: Online multiple kernel regression. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 293ā302 (2014)
Seldin, Y., Bartlett, P.L., Crammer, K., Abbasi-Yadkori, Y.: Prediction with limited advice and multiarmed bandits with paid observations. In: Proceedings of the 31st International Conference on Machine Learning, pp. 280ā287 (2014)
Shen, Y., Chen, T., Giannakis, G.B.: Random feature-based online multi-kernel learning in environments with unknown dynamics. J. Mach. Learn. Res. 20(22), 1ā36 (2019)
Tsallis, C.: Possible generalization of boltzmann-gibbs statistics. J. Stat. Phys. 52(1), 479ā487 (1988)
Wang, J., Hoi, S.C.H., Zhao, P., Zhuang, J., Liu, Z.: Large scale online kernel classification. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 1750ā1756 (2013)
Yang, T., Mahdavi, M., Jin, R., Yi, J., Hoi, S.C.H.: Online kernel selection: algorithms and evaluations. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 1197ā1202 (2012)
Yu, F.X., Suresh, A.T., Choromanski, K.M., Holtmann-Rice, D.N., Kumar, S.: Orthogonal random features. Adv. Neural. Inf. Process. Syst. 29, 1975ā1983 (2016)
Zhang, L., Yi, J., Jin, R., Lin, M., He, X.: Online kernel learning with a near optimal sparsity bound. In: Proceedings of the 30th International Conference on Machine Learning, pp. 621ā629 (2013)
Zhang, X., Liao, S.: Online kernel selection via incremental sketched kernel alignment. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 3118ā3124 (2018)
Zimmert, J., Seldin, Y.: An optimal algorithm for stochastic and adversarial bandits. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pp. 467ā475 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Liao, S. (2023). Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)