Abstract
We study supervised classification for data streams with a high number of input variables. The basic naïve Bayes classifier is attractive for its simplicity and performance when the strong assumption of conditional independence is valid. Variable selection and model averaging are two common ways to improve this model. This process leads to manipulate a weighted naïve Bayes classifier. We focus here on direct estimation of weighted naïve Bayes classifiers. We propose a sparse regularization of the model log-likelihood which takes into account knowledge relative to each input variable. The sparse regularized likelihood being non convex, we propose an online gradient algorithm using mini-batches and random perturbation according to a metaheuristic to avoid local minima. In our experiments, we first study the optimization quality, then the classifier performance under varying its parameterization. The results confirm the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We consider in this paper that estimates of prior probabilities \(P(Y=C_j)\) and of conditional probabilities \(p(x_k|C_j)\) are available. In our experiments, these probabilities are estimated using univariate discretization or grouping according to the MODL method (see Boullé 2007b).
References
Bach, F., & Moulines, E. (2013). Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n). In Neural information processing systems (NIPS) (pp. 773–781). USA
Bertsekas, D. P. (1976). On the goldstein-levitin-polyak gradient projection method. IEEE Transactions on Automatic Control, 21(2), 174–184.
Boullé, M. (2006). Regularization and averaging of the selective naive bayes classifier. In International Joint Conference on Neural Network Proceedings (pp. 1680–1688).
Boullé, M. (2007a). Compression-based averaging of selective naive bayes classifiers. Journal of Machine Learning Research, 8, 1659–1685.
Boullé, M. (2007b). Recherche d’une représentation des données efficace pour la fouille des grandes bases de données. PhD thesis, Ecole Nationale Supérieure des Télécommunications.
Dekel, O., Gilad-Bachrach, R., Shamir, O., & Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13(1), 165–202.
Gama, J. (2010). Knowledge discovery from data streams (1st ed.). Boca Raton: Chapman & Hall, CRC.
Godec, M., Leistner, C., Saffari, A., & Bischof, H. (2010). On-line random naive bayes for tracking. In International Conference on Pattern Recognition (ICPR) (pp. 3545–3548). IEEE Computer Society.
Guigourès, R., & Boullé, M. (2011). Optimisation directe des poids de modèles dans un prédicteur bayésien naif moyenné. In 13èmes Journées Francophones “Extraction et Gestion de Connaissances” (EGC 2011) (pp. 77–82).
Hand, D. J., & Yu, K. (2001). Idiot’s bayes-not so stupid after all? International Statistical Review, 69(3), 385–398.
Hansen, P., & Mladenovic, N. (2001). Variable neighborhood search: Principles and applications. European Journal of Operational Research, 130(3), 449–467.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–417.
Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In International Conference on Machine Learning (pp. 284–292).
Kuncheva, L. I., & Rodríguez, J. J. (2007). Classifier ensembles with a random linear oracle. IEEE Transactions on Knowledge and Data Engineering, 19(4), 500–508.
Lange, K. (2004). Optimization. Springer Texts in Statistics. New York: Springer.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of bayesian classifiers. In National Conference on Artificial Intelligence (pp. 223–228).
Nesterov, Y. (2004). Introductory lectures on convex optimization: A basic course. Applied optimization. Boston: Kluwer Academic Publishers.
Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Mathematical Programming, 140(1), 125–161.
Pilanci, M., Wainwright, M. J., & Ghaoui, L. (2015). Sparse learning via boolean relaxations. Mathematical Programming, 151(1), 63–87.
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE International Conference On Neural Networks (pp. 586–591).
Trevor, H., Robert, T., & Martin, W. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton: Chapman and Hall/CRC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Hue, C., Boullé, M., Lemaire, V. (2017). Online Learning of a Weighted Selective Naive Bayes Classifier with Non-convex Optimization. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)