Incremental Weighted Naive Bays Classifiers for Data Stream

Salperwyck, Christophe; Lemaire, Vincent; Hue, Carine

doi:10.1007/978-3-662-44983-7_16

Incremental Weighted Naive Bays Classifiers for Data Stream

Christophe Salperwyck²¹,
Vincent Lemaire²¹ &
Carine Hue²¹

Conference paper

2970 Accesses
5 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (X _i) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The naive Bayes classifier simply relies on the estimation of the univariate conditional probabilities P(X _i | C). This estimation can be provided on a data stream using a “supervised quantiles summary.” The literature shows that the naive Bayes classifier can be improved (1) using a variable selection method (2) weighting the explanatory variables. Most of these methods are related to batch (off-line) learning and need to store all the data in memory and/or require reading more than once each example. Therefore they cannot be used on data stream. This paper presents a new method based on a graphical model which computes the weights on the input variables using a stochastic estimation. The method is incremental and produces a Weighted Naive Bayes Classifier for data stream. This method will be compared to classical naive Bayes classifier on the Large Scale Learning challenge datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.
Google Scholar
Boullé, M. (2006a). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.
Article Google Scholar
Boullé, M. (2006b). Regularization and averaging of the selective naive bayes classifier. In The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1680–1688).
Google Scholar
Cotter, A., Shamir, O., Srebro, N., & Sridharan, K. (2011). Better mini-batch algorithms via accelerated gradient methods. In J. Shawe-taylor, R.S. Zemel, P. Bartlett, F. C. N. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp. 1647–1655). http://books.nips.cc/papers/files/nips24/NIPS2011_0942.pdf.
Gama, J. (2010). Knowledge discovery from data streams. Chapman and Hall/CRC Press
Google Scholar
Gama, J., & Pinto, C. (2006). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM Symposium on Applied Computing (pp. 662–667).
Google Scholar
Greenwald, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2), 58–66.
Article Google Scholar
Guigourès, R., & Boullé, M. (2011). Optimisation directe des poids de modèles dans un prédicteur Bayésien naif moyenné. In Extraction et gestion des connaissances EGC’2011 (pp. 77–82).
Google Scholar
Guyon, I., Lemaire, V., Boullé, M., Dror, G., & Vogel, D. (2009). Analysis of the KDD cup 2009: Fast scoring on a large orange customer database. In JMLR: Workshop and Conference Proceedings (Vol. 7, pp. 1–22).
Google Scholar
Hoeting, J., Madigan, D., & Raftery, A. (1999). Bayesian model averaging: a tutorial. Statistical Science, 14(4), 382–417.
Article MATH MathSciNet Google Scholar
Koller, D., & Sahami, M. (1996, May). Toward optimal feature selection. In International Conference on Machine Learning (pp. 284–292).
Google Scholar
Kuncheva, L. I., & Plumpton, C. O. (2008). Adaptive learning rate for online linear discriminant classifiers. In Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (pp. 510–519). Heidelberg: Springer.
Google Scholar
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the National Conference on Artificial Intelligence (pp. 223–228).
Google Scholar
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In R. L. Mantaras & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann.
Google Scholar
Lecun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient BackProp. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Lecture notes in computer science (Vol. 1524, pp. 5–50). Heidelberg: Springer.
Google Scholar
Salperwyck, C. (2012). Apprentissage incrémental en ligne sur flux de données. PhD thesis, University of Lille.
Google Scholar
Salperwyck, C., & Lemaire, V. (2013). A two layers incremental discretization based on order statistics. In Statistical models for data analysis (pp. 315–323). Springer International Publishing. http://rd.springer.com/chapter/10.1007%2F978-3-319-00032-9_36

Download references

Author information

Authors and Affiliations

Orange Labs, 2 avenue Pierre Marzin, 22300, Lannion, France
Christophe Salperwyck, Vincent Lemaire & Carine Hue

Authors

Christophe Salperwyck
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lemaire
View author publications
You can also search for this author in PubMed Google Scholar
Carine Hue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Salperwyck .

Editor information

Editors and Affiliations

University of Essex, Colchester, United Kingdom
Berthold Lausen
University of Luxembourg, Walferdange, Luxembourg
Sabine Krolak-Schwerdt
University of Luxembourg, Walferdange, Luxembourg
Matthias Böhmer

Appendix: Derivative of the Cost Function

The graphical model is built to have directly the values of the P(C _k | X) at the output. The goal is to maximize the likelihood and therefore to minimize the negative log likelihood. The first step in the calculation is to decompose the softmax considering that each output could be seen as the succession of two steps: an activation followed by a function of this activation.

Here the activation function could be seen as: $O_{k} = f(H_{k}) =\mathrm{ exp}(H_{k})$ and the output of the softmax part of our graphical model is: $P_{k} = \frac{O_{k}} {\sum _{j=1}^{K}O_{j}}$. The derivative of the activation function is:

$$\displaystyle{ \frac{\partial O_{k}} {\partial H_{k}} = f'(H_{k}) =\mathrm{ exp}(H_{k}) = O_{k} }$$

(5)

The cost function being the −log likelihood, we have to consider two cases: (1) the desired output is equal to 1 or (2) the desired output is equal to 0. For the following we note:

$$\displaystyle{ \frac{\partial \mathrm{Cost}} {\partial H_{k}} = \frac{\partial C} {\partial P_{k}} \frac{\partial P_{k}} {\partial O_{k}} \frac{\partial O_{k}} {\partial H_{k}} }$$

(6)

In the case where the desired output of the output k is equal to 1 by replacing (5) in (6):

$$\displaystyle{ \frac{\partial \mathrm{Cost}} {\partial H_{k}} = \frac{\partial C} {\partial P_{k}} \frac{\partial P_{k}} {\partial O_{k}} \frac{\partial O_{k}} {\partial H_{k}} = \frac{-1} {P_{k}} \frac{\partial P_{k}} {\partial O_{k}}O_{k} }$$

(7)

$$\displaystyle\begin{array}{rcl} \frac{\partial \mathrm{Cost}} {\partial H_{k}} & =& \frac{-1} {P_{k}}\left [\sum _{l=1,l\neq k}^{K}\left ( \frac{O_{l}} {\big(\sum _{j=1}^{K}O_{j}\big)^{2}}\right )\right ]O_{k} \\ & =& \frac{-1} {P_{k}}\left [\frac{\big(\sum _{j=1}^{K}O_{j}\big) - O_{k}} {\big(\sum _{j=1}^{K}O_{j}\big)^{2}} \right ]O_{k} {}\end{array}$$

(8)

$$\displaystyle\begin{array}{rcl} \frac{\partial \mathrm{Cost}} {\partial H_{k}} & =& \frac{-1} {P_{k}}\left [\frac{\big(\sum _{j=1}^{K}O_{j}\big) - O_{k}} {\big(\sum _{j=1}^{K}O_{j}\big)} \right ] \frac{O_{k}} {\big(\sum _{j=1}^{K}O_{j}\big)} \\ & =& \frac{-1} {P_{k}}\left [1 - \frac{O_{k}} {\big(\sum _{j=1}^{K}O_{j}\big)}\right ] \frac{O_{k}} {\big(\sum _{j=1}^{K}O_{j}\big)} {}\end{array}$$

(9)

Therefore

$$\displaystyle{ \frac{\partial \mathrm{Cost}} {\partial H_{k}} = \frac{-1} {P_{k}}[1 - P_{k}]P_{k} = P_{k} - 1 }$$

(10)

In the case where the desired output of the output k is equal to 0 the error is only transmitted by the normalization part of the softmax function since the derivative for an output where the desired value is 0 is equal to 0. Therefore with similar steps we have: $\frac{\partial \mathrm{Cost}} {\partial H_{k}} = P_{k}$

Finally we conclude: $\frac{\partial \mathrm{Cost}} {\partial H_{k}} = P_{k} - T_{k},\forall k$ where T _k is the desired probability and P _k the estimated probability. Then the rest of the calculation of $\frac{\partial \mathrm{Cost}} {\partial w_{\mathit{ik}}}$ is straightforward.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salperwyck, C., Lemaire, V., Hue, C. (2015). Incremental Weighted Naive Bays Classifiers for Data Stream. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-662-44983-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Derivative of the Cost Function

Appendix: Derivative of the Cost Function

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation