Skip to main content

Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data

  • Conference paper
Rough Sets and Current Trends in Computing (RSCTC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4259))

Included in the following conference series:

  • 1222 Accesses

Abstract

This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naïve Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network’s conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way can help various learning algorithms to derive classifiers of improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  2. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  3. Wolpert, D.H.: Stacked Generalization Neural Networks. vol. 5, pp. 241–259 (1992)

    Google Scholar 

  4. Aha, D.W.: Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-based Learning Algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)

    Article  Google Scholar 

  5. Kohavi, R., Sahami, M.: Error-based and Entropy-based Discretization of Continuous Features. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)

    Google Scholar 

  6. Pazzani, M.: Constructive induction of Cartesian product attributes. Information, Statistics and Induction in Science, 66–77 (1996)

    Google Scholar 

  7. Almuallim, H., Dietterich, T.G.: Learning With Many Irrelevant Features. In: Proc. of the 9th National Conference on Artificial Intelligence, pp. 547–552 (1991)

    Google Scholar 

  8. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2), 103–134 (1999)

    Article  Google Scholar 

  9. Cho, S., Cha, K.: Evolution of Neural Network Training Set through Addition of Virtual samples. In: Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp. 685–688 (1996)

    Google Scholar 

  10. Marcel, S.: Improving Face Verification using Symmetric Transformation. In: Proc. of IEICE Transactions on Information and Systems, vol. E81(1), pp. 124–135 (1993)

    Google Scholar 

  11. Wen, J., Zhao, J., Luo, S., Huang, H.: Face Recognition Method Based on Virtual Sample. In: Proc. of the ICII, vol. 3, pp. 557–562 (2001)

    Google Scholar 

  12. Thian, N.P.H., Marcel, S., Bengio, S.: Improving Face Authentication Using Virtual Samples. In: Proc. of the Acoustics, Speech, and Signal International Conference, vol. 3, pp. 233–236 (2003)

    Google Scholar 

  13. Burges, C., Scholkopf, B.: Improving the Accuracy and Speed of Support Vector Machines. In: Proc. of Advances in Neural Information Processing System, vol. 9(7) (1997)

    Google Scholar 

  14. Miyao, H., Maruyama, M., Nakano, Y., Hanamoi, T.: Off-line handwritten character recognition by SVM based on the virtual examples synthesized from on-line characters. In: Proc. of Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 494–498 (2005)

    Google Scholar 

  15. Lee, K.S., An, D.U.: Performance Improvement by a Virtual Documents Technique in Text Categorization. Korea Information Processing Society 11-B(4), 501–508 (2004)

    Google Scholar 

  16. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)

    Article  MATH  Google Scholar 

  17. John, G., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  18. Duda, R.O., Hart, P.E.: Pattern Classification and scene analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  19. Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proc. of the 21th International Conference on Machine Learning, pp. 361–368 (2004)

    Google Scholar 

  20. Greiner, R., Zhou, W.: Structural Extension to Logistic Regression: Discriminative parameter learning of belief net classifiers. In: Proc. of the 18th National Conference on Artificial Intelligence, pp. 167–173 (2002)

    Google Scholar 

  21. Burge, J., Lane, T.: Learning Class-Discriminative Dynamic Bayesian Networks. In: Proc. of the 22th International Conference on Machine Learning, pp. 97–104 (2005)

    Google Scholar 

  22. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. of the 18th International Conference on Machine Learning, pp. 441–448 (2004)

    Google Scholar 

  23. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, CA, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  24. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  25. Platt, I.J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  26. Witten, I.H., Frank, E.: Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman Publishers, San Francisco (1999)

    Google Scholar 

  27. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network Classifiers. Machine Learning 29, 131–163 (1997)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, Y., Kang, J., Kang, B., Ryu, K.R. (2006). Sampling of Virtual Examples to Improve Classification Accuracy for Nominal Attribute Data. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_66

Download citation

  • DOI: https://doi.org/10.1007/11908029_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47693-1

  • Online ISBN: 978-3-540-49842-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics