Abstract
Naive Bayes classifier is a frequently used method in various natural language processing tasks. Inspired by a modified version of the method called the flexible Bayes classifier, we explore the use of linear feature transformations together with the Bayesian classifiers, because it provides us an elegant way to endow the classifier with an external information that is relevant to the task. While the flexible Bayes classifier is based on the idea of using kernel density estimation to obtain the class conditional probabilities of continuously valued attributes, we use the linear transformations to smooth the feature frequency counts of discrete valued attributes. We evaluate the method on the context sensitive spelling error correction problem using the Reuters corpus. For this particular task, we define a positional feature transformation and a word feature transformation that take advantage of the positional information of the context words and the part-of-speech information of words, respectively. Our experimental results show that the performance of the Bayesian classifiers in the natural language disambiguation tasks can be improved with the proposed transformations and that the incorporation of external information via the linear feature transformations is a promising research direction.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pahikkala, T., Ginter, F., Boberg, J., Jarvinen, J., Salakoski, T.: Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinformatics 6, 157 (2005)
Pahikkala, T., Pyysalo, S., Ginter, F., Boberg, J., Järvinen, J., Salakoski, T.: Kernels incorporating word positional information in natural language disambiguation tasks. In: Russell, I., Markov, Z. (eds.) Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, Clearwater Beach, Florida, pp. 442–447. AAAI Press, Menlo Park (2005)
Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., Salakoski, T.: Improving the performance of bayesian and support vector classifiers in word sense disambiguation using positional information. In: Honkela, T., Könönen, V., Pöllä, M., Simula, O. (eds.) Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, Espoo, Finland, Helsinki University of Technology, pp. 90–97 (2005)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Besnard, P., Hanks, S. (eds.) Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall PTR, Upper Saddle River (2000)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Joshi, A., Palmer, M. (eds.) Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 310–318. Morgan Kaufmann Publishers, San Francisco (1996)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London (1986)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34, 107–130 (1999)
Rose, T.G., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1: From yesterday’s news to tomorrow’s language resources. In: Rodriguez, M.G., Araujo, C.P.S. (eds.) Proceedings of the Third International Conference on Language Resources and Evaluation, ELRA, Paris, France (2002)
Fawcett, T.: Roc graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs, Palo Alto, California (2003)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Pahikkala, T., Pyysalo, S., Boberg, J., Järvinen, J., Salakoski, T.: Matrix representations, linear transformations, and kernels for natural language processing (submitted, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pahikkala, T., Boberg, J., Mylläri, A., Salakoski, T. (2006). Incorporating External Information in Bayesian Classifiers Via Linear Feature Transformations. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_41
Download citation
DOI: https://doi.org/10.1007/11816508_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)