Abstract
Random Forests (RF) is a successful classifier exhibiting performance comparable to Adaboost, but is more robust. The exploitation of two sources of randomness, random inputs (bagging) and random features, make RF accurate classifiers in several domains. We hypothesize that methods other than classification or regression trees could also benefit from injecting randomness. This paper generalizes the RF framework to other multiclass classification algorithms like the well-established MultiNomial Logit (MNL) and Naive Bayes (NB). We propose Random MNL (RMNL) as a new bagged classifier combining a forest of MNLs estimated with randomly selected features. Analogously, we introduce Random Naive Bayes (RNB). We benchmark the predictive performance of RF, RMNL and RNB against state-of-the-art SVM classifiers. RF, RMNL and RNB outperform SVM. Moreover, generalizing RF seems promising as reflected by the improved predictive performance of RMNL.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baltas, G., Doyle, P.: Random utility models in marketing: a survey. Journal of Business Research 51(2), 115–125 (2001)
Barandela, R., Sánchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3), 849–851 (2003)
Ben-Akiva, M., Lerman, S.R.: Discrete Choice Analysis: Theory and Application to Travel Demand. The MIT Press, Cambridge (1985)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988)
Dietterich, T.G.: Machine-Learning Research – Four current directions. AI Magazine 18(4), 97–136 (1997)
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Freund, Y., Shapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proc. of the Thirteenth International Conference, pp. 148–156 (1996)
Langley, P., Iba, W., Thomas, K.: An analysis of Baysian classifiers. In: Proceedings of the Tenth National Conference on Artificial Inteligence, pp. 223–228. AAAI Press, Stanford (1992)
Louviere, J., Street, D.J., Burgess, L.: A 20+ retrospective on choice experiments. In: Wind, Y., Green, P.E. (eds.) Marketing Research and Modeling: Progress and Prospectives, Academic Publishers, New York (2003)
Morrison, D.G.: On the interpretation of discriminant analysis. Journal of Marketing Research 6, 156–163 (1969)
Prinzie, A., Van den Poel, D.: Predicting home-appliance acquisition sequences: Markov/Markov for Discrimination and survival analysis for modelling sequential information in NPTB models. Decision Support Systems (accepted 2007), http://dx.doi.org/10.1016/j.dss.2007.02.008
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of the Twentieh National Conference on Artificial Inteligence, AAAI Press, Stanford (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prinzie, A., Van den Poel, D. (2007). Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-74469-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)