Naïve Bayes with Higher Order Attributes

Rosell, Bernard; Hellerstein, Lisa

doi:10.1007/978-3-540-24840-8_8

Bernard Rosell¹⁸ &
Lisa Hellerstein¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3060))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

Abstract

The popular Naïve Bayes (NB) algorithm is simple and fast. We present a new learning algorithm, Extended Bayes (EB), which is based on Naïve Bayes. EB is still relatively simple, and achieves equivalent or higher accuracy than NB on a wide variety of the UC-Irvine datasets. EB is based on two ideas, which interact. The first is to find sets of seemingly dependent attributes and to add them as new attributes. The second idea is to exploit “zeroes”, that is, the negative evidence provided by attribute values that do not occur at all in particular classes in the training data. Zeroes are handled in Naïve Bayes by smoothing. In contrast, EB uses them as evidence that a potential class labeling may be wrong.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., Swami, A.: An interval classifier for database mining applications. In: Proc. of the 18th VLDB Conference, pp. 560–573 (1992)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering 5(6), 914–925 (1993)
Article Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference, pp. 207–216 (1993)
Google Scholar
Blake, C.L., Merz, C.J.U.: Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Conference, pp. 265–276 (1997)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley Interscience, New York (1973)
MATH Google Scholar
Foster, T., Kohavi, R., Provost, F.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning, pp. 445–453 (1998)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Article MATH Google Scholar
Hsu, C., Lin, C.: A comparison of methods for multi-class Support Vector Machines. IEEE Transactions On Neural Networks 13(2), 415–425 (2002)
Article Google Scholar
Keogh, E.J., Pazzani, M.J.: Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics, pp. 225–230 (1999)
Google Scholar
Keim, M., Lewis, D.D., Madigan, D.: Bayesian information retrieval: preliminary evaluation. In: Preliminary Papers of the 6th International Workshop on Artificial Intelligence and Statistics, pp. 303–310 (1997)
Google Scholar
Kohavi, R., Becker, B., Sommerfield, D.: Improving simple Bayes. In: Proceedings of the 9th European Conference on Machine Learning, pp. 78–87 (1997)
Google Scholar
Kononenko, I.: Semi-naïve Bayesian classifier. In: Proceedings of the 6th European Working Session on Learning, pp. 206–219 (1991)
Google Scholar
Lewis, D.D.: Naïve (Bayes) at forty: The independence assumption in information retrieval. In: Proceedings of the European Conference on Machine Learning, pp. 4–15 (1998)
Google Scholar
Ling, C.X., Zhang, H.: Toward Bayesian classifiers with accurate probabilities. In: Proceedings of the 6th Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 123–134 (2002)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th ACM SIGKDD Conference, pp. 80–86 (1998), http://www.comp.nus.edu.sg/~dm2/result.html
McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes classification. In: Proceedings of the AAAI Workshop on Learning for Text Categorization (1998)
Google Scholar
Meretakis, D., Wuthrich, B.: Extending Naïve Bayes classifiers using long itemsets. In: Proceedings of the 5th ACM SIGKDD Conference, pp. 165–174 (1999)
Google Scholar
Meretakis, D., Hongjun, L., Wuthrich, B.: A Study on the performance of Large Bayes Classifier. In: Proceedings of the 11th European Conference on Machine Learning, pp. 271–279 (2000)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, San Francisco (1997)
MATH Google Scholar
Pazzani, M.: Searching for dependencies in Bayesian classifiers. In: Artificial Intelligence and Statistics IV. Lecture Notes In Statistics, Springer, New York (1995)
Google Scholar
Peng, F., Schuurmans, D.: Combining Naïve Bayes and n-gram Language Models for Test Classification. In: Sebastiani, F. (ed.) Advances in Information Retrieval: Proceedings of the 25th European Conference On Information Retrieval Research, pp. 335–350 (2003)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: Simplifying decision trees. International Journal for Man Machine Studies 27, 221–234 (1987)
Article Google Scholar
Rachlin, J., Kasif, S., Salzberg, S., Aha, D.W.: Towards a better understanding of memory-based reasoning systems. In: Proceedings of the 11th International Conference on Machine Learning, pp. 242–250 (1994)
Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naïve Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning, pp. 616–623 (2003)
Google Scholar
Roth, D.: Learning in natural language. In: Proceedings of the International Joint Conference of Artificial Intelligence, pp. 898–904 (1999)
Google Scholar
Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd ACM SIGKDD Conference, pp. 335–338 (1995)
Google Scholar
Witten, I., Frank, E.: Machine Learning Algorithms in Java. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Computer and Information Science, Polytechnic University, 6 Metrotech Center, Brooklyn, NY, 11201, USA
Bernard Rosell & Lisa Hellerstein

Authors

Bernard Rosell
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Hellerstein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Windsor, 401 Sunset Avenue, N9B 3P4, Windsor, Ontario, Canada
Ahmed Y. Tawfik
School of Computer Science, University of Windsor, Windsor, Ontario,
Scott D. Goodwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rosell, B., Hellerstein, L. (2004). Naïve Bayes with Higher Order Attributes. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-24840-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22004-6
Online ISBN: 978-3-540-24840-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics