Abstract
This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naïve Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network’s conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way can help various learning algorithms to derive classifiers of improved accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 148–156 (1996)
Wolpert, D.H.: Stacked Generalization Neural Networks. vol. 5, pp. 241–259 (1992)
Aha, D.W.: Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-based Learning Algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)
Kohavi, R., Sahami, M.: Error-based and Entropy-based Discretization of Continuous Features. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)
Pazzani, M.: Constructive induction of Cartesian product attributes. Information, Statistics and Induction in Science, 66–77 (1996)
Almuallim, H., Dietterich, T.G.: Learning With Many Irrelevant Features. In: Proc. of the 9th National Conference on Artificial Intelligence, pp. 547–552 (1991)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2), 103–134 (1999)
Cho, S., Cha, K.: Evolution of Neural Network Training Set through Addition of Virtual samples. In: Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp. 685–688 (1996)
Marcel, S.: Improving Face Verification using Symmetric Transformation. In: Proc. of IEICE Transactions on Information and Systems, vol. E81(1), pp. 124–135 (1993)
Wen, J., Zhao, J., Luo, S., Huang, H.: Face Recognition Method Based on Virtual Sample. In: Proc. of the ICII, vol. 3, pp. 557–562 (2001)
Thian, N.P.H., Marcel, S., Bengio, S.: Improving Face Authentication Using Virtual Samples. In: Proc. of the Acoustics, Speech, and Signal International Conference, vol. 3, pp. 233–236 (2003)
Burges, C., Scholkopf, B.: Improving the Accuracy and Speed of Support Vector Machines. In: Proc. of Advances in Neural Information Processing System, vol. 9(7) (1997)
Miyao, H., Maruyama, M., Nakano, Y., Hanamoi, T.: Off-line handwritten character recognition by SVM based on the virtual examples synthesized from on-line characters. In: Proc. of Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 494–498 (2005)
Lee, K.S., An, D.U.: Performance Improvement by a Virtual Documents Technique in Text Categorization. Korea Information Processing Society 11-B(4), 501–508 (2004)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
John, G., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Duda, R.O., Hart, P.E.: Pattern Classification and scene analysis. Wiley, New York (1973)
Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proc. of the 21th International Conference on Machine Learning, pp. 361–368 (2004)
Greiner, R., Zhou, W.: Structural Extension to Logistic Regression: Discriminative parameter learning of belief net classifiers. In: Proc. of the 18th National Conference on Artificial Intelligence, pp. 167–173 (2002)
Burge, J., Lane, T.: Learning Class-Discriminative Dynamic Bayesian Networks. In: Proc. of the 22th International Conference on Machine Learning, pp. 97–104 (2005)
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. of the 18th International Conference on Machine Learning, pp. 441–448 (2004)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, CA, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Platt, I.J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Witten, I.H., Frank, E.: Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman Publishers, San Francisco (1999)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network Classifiers. Machine Learning 29, 131–163 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, Y., Kang, J., Kang, B., Ryu, K.R. (2006). Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_66
Download citation
DOI: https://doi.org/10.1007/11908029_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47693-1
Online ISBN: 978-3-540-49842-1
eBook Packages: Computer ScienceComputer Science (R0)