Abstract
This paper discusses the problem of multi-relational classification on imbalanced datasets. To solve the class imbalance problem, a new multi-relational Naive Bayesian classifier named R-NB is proposed, the attribute filter criterion based on mutual information is upgraded to deal with multi-relational data directly and the basic sampling methods include under-sampling and over-sampling are adopted to eliminate or minimize rarity by altering the distribution of relational examples. Experiments show, with the help of attribute filter method, R-NB can get better results than those without that. And, experiments show that multi-relational classifiers with under-sampling methods can provide more accurate results than that with over-sampling methods considering the ROC curve.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Weiss, G.M.: Timeweaver: a genetic algorithm for identifying predictive patterns in sequences of events. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 718–725. Morgan Kaufmann, San Francisco (1999)
Carvalho, D.R., Freitas, A.A.: A genetic algorithm for discovering small-disjunct rules in data mining. Applied Soft Computing 2(2), 75–88 (2002)
Carvalho, D.R, Freitas, A.A.: New results for a hybrid decision tree/genetic algorithm for data mining. In: Proceedings of the Fourth International Conference on Recent Advances in Soft Computing, pp. 260–265 (2002)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. In: Workshop on Learning from Imbalanced Data Sets II, International Conference on Machine Learning (2003)
Japkowicz, N.: Concept learning in the presence of betweenclass and within-class imbalances. In: Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pp. 67–77. Springer, Heidelberg (2001)
Weiss, G.M.: Learning with rare cases and small disjuncts. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 558–565. Morgan Kaufmann, San Francisco (1995)
Bradley: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42, 203–231 (2001)
Liu, H., Yin, X., Han, J.: An efficient multi-relational naive bayesian classifier based on semantic relationship graphs. In: Proceedings of the 4th international workshop on Multi-relational mining, Chicago, Illinois, pp. 39–48 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, G., Bao, H., Meng, X. (2008). Multi-Relational Classification in Imbalanced Domains. In: Kang, L., Cai, Z., Yan, X., Liu, Y. (eds) Advances in Computation and Intelligence. ISICA 2008. Lecture Notes in Computer Science, vol 5370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92137-0_62
Download citation
DOI: https://doi.org/10.1007/978-3-540-92137-0_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92136-3
Online ISBN: 978-3-540-92137-0
eBook Packages: Computer ScienceComputer Science (R0)