Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data

Lee, Yujung; Kang, Jaeho; Kang, Byoungho; Ryu, Kwang Ryel

doi:10.1007/11908029_66

Yujung Lee²⁵,
Jaeho Kang²⁵,
Byoungho Kang²⁵ &
…
Kwang Ryel Ryu²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4259))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1222 Accesses

Abstract

This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naïve Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network’s conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way can help various learning algorithms to derive classifiers of improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Wolpert, D.H.: Stacked Generalization Neural Networks. vol. 5, pp. 241–259 (1992)
Google Scholar
Aha, D.W.: Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-based Learning Algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)
Article Google Scholar
Kohavi, R., Sahami, M.: Error-based and Entropy-based Discretization of Continuous Features. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)
Google Scholar
Pazzani, M.: Constructive induction of Cartesian product attributes. Information, Statistics and Induction in Science, 66–77 (1996)
Google Scholar
Almuallim, H., Dietterich, T.G.: Learning With Many Irrelevant Features. In: Proc. of the 9th National Conference on Artificial Intelligence, pp. 547–552 (1991)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2), 103–134 (1999)
Article Google Scholar
Cho, S., Cha, K.: Evolution of Neural Network Training Set through Addition of Virtual samples. In: Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp. 685–688 (1996)
Google Scholar
Marcel, S.: Improving Face Verification using Symmetric Transformation. In: Proc. of IEICE Transactions on Information and Systems, vol. E81(1), pp. 124–135 (1993)
Google Scholar
Wen, J., Zhao, J., Luo, S., Huang, H.: Face Recognition Method Based on Virtual Sample. In: Proc. of the ICII, vol. 3, pp. 557–562 (2001)
Google Scholar
Thian, N.P.H., Marcel, S., Bengio, S.: Improving Face Authentication Using Virtual Samples. In: Proc. of the Acoustics, Speech, and Signal International Conference, vol. 3, pp. 233–236 (2003)
Google Scholar
Burges, C., Scholkopf, B.: Improving the Accuracy and Speed of Support Vector Machines. In: Proc. of Advances in Neural Information Processing System, vol. 9(7) (1997)
Google Scholar
Miyao, H., Maruyama, M., Nakano, Y., Hanamoi, T.: Off-line handwritten character recognition by SVM based on the virtual examples synthesized from on-line characters. In: Proc. of Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 494–498 (2005)
Google Scholar
Lee, K.S., An, D.U.: Performance Improvement by a Virtual Documents Technique in Text Categorization. Korea Information Processing Society 11-B(4), 501–508 (2004)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
John, G., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and scene analysis. Wiley, New York (1973)
MATH Google Scholar
Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proc. of the 21th International Conference on Machine Learning, pp. 361–368 (2004)
Google Scholar
Greiner, R., Zhou, W.: Structural Extension to Logistic Regression: Discriminative parameter learning of belief net classifiers. In: Proc. of the 18th National Conference on Artificial Intelligence, pp. 167–173 (2002)
Google Scholar
Burge, J., Lane, T.: Learning Class-Discriminative Dynamic Bayesian Networks. In: Proc. of the 22th International Conference on Machine Learning, pp. 97–104 (2005)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. of the 18th International Conference on Machine Learning, pp. 441–448 (2004)
Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, CA, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Platt, I.J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar
Witten, I.H., Frank, E.: Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman Publishers, San Francisco (1999)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network Classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Pusan National University, San 30, Jangjeon-dong, Kumjeong-gu, Busan, Korea
Yujung Lee, Jaeho Kang, Byoungho Kang & Kwang Ryel Ryu

Authors

Yujung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jaeho Kang
View author publications
You can also search for this author in PubMed Google Scholar
Byoungho Kang
View author publications
You can also search for this author in PubMed Google Scholar
Kwang Ryel Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco
Graduate School of Engineering, Department of Electrical Engineering and Computer Sciences, University of Hyogo, 2167 Shosha, 671-2280,, Himeji, Hyogo, Japan
Yutaka Hata
Department of Medical Informatics, Faculty of Medicine, Shimane University, 89-1 Enya-cho, Izumo, 693-8501, Shimane, Japan
Shoji Hirano
Department of Systems Innovation, Graduate School of Engineering Science, Osaka University, 1-3, Machikaneyama, Toyonaka, 560-8531, Osaka, Japan
Masahiro Inuiguchi
Department of Risk Engineering, School of Systems and Information Engineering, University of Tsukuba, 305-8573, Ibaraki, Japan
Sadaaki Miyamoto
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Hung Son Nguyen
Systems Research Institute, Polish Academy of Sciences, 01-447, Warsaw, Poland
Roman Słowiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, Y., Kang, J., Kang, B., Ryu, K.R. (2006). Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_66

Download citation

DOI: https://doi.org/10.1007/11908029_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47693-1
Online ISBN: 978-3-540-49842-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics