Skip to main content

An Effective Method to Find Better Data Mining Model Using Inferior Class Oversampling

  • Conference paper
  • 1711 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 206))

Abstract

Decision trees are known to have very good performance in the task of data mining of classification, and sampling is often used to determine some proper training sets. Among many parameters the accuracy of generated decision trees depends upon training data sets much, so we want to find some better classification models from the given data sets by oversampling the instances that have higher error rates. The resulting decision trees have better accuracy for classes that had lower error rates, but have worse accuracy for classes that have higher error rates. In order to take advantage of the better accuracy and compensate the worse accuracy, we suggest using class association Experiments with real world data sets showed promising results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)

    Google Scholar 

  2. Russel, S., Novig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  3. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  4. Heaton, J.: Introduction to Neural Networks for C#, 2nd edn. Heaton Research Inc. (2008)

    Google Scholar 

  5. Lippmann, R.P.: An Introduction to Computing with Neural Nets. IEEE ASSP Magazine 3(4), 4–22 (1987)

    Article  Google Scholar 

  6. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc, San Francisco (1993)

    Google Scholar 

  7. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group (1984)

    Google Scholar 

  8. Larose, D.T.: Data Mining Methods and Models. Wiley Interscience, Hoboken (2006)

    MATH  Google Scholar 

  9. Fukunaga, K., Hayes, R.R.: Effects of Sample Size in Classifier Design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(8), 873–885 (1989)

    Article  Google Scholar 

  10. Mazuro, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks 21(2-3), 427–436 (2008)

    Article  Google Scholar 

  11. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)

    MATH  Google Scholar 

  12. Agrawal, R., Mannila, H., Srikant, H.R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smith, P., Uthurusamy, R. (eds.), pp. 307–328. AAAI Press/The MIT Press (1996)

    Google Scholar 

  13. Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)

    Article  MathSciNet  Google Scholar 

  14. Park, J.S., Chen, M., Yu, P.S.: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering 9(5), 813–825 (1997)

    Article  Google Scholar 

  15. Toivonen, H.: Discovery of Frequent Patterns in Large Data Collections. phD thesis, Department of Computer Science, University of Helsinki, Finland (1996)

    Google Scholar 

  16. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery 8, 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  17. Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: Proceedings 2001 Int. Conf. on Data Mining (ICDM 2001), pp. 369–376 (2001)

    Google Scholar 

  18. Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), pp. 80–86 (1998)

    Google Scholar 

  19. Toivonen, H., Klemettinen, M., Mannila, H., Rokainen, P., Hatonen, K.: Pruning and Grouping of Discovered Association Rules. In: Workshop Notes of the ECML 1995 Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pp. 47–52 (1995)

    Google Scholar 

  20. Dimitrijević, M., Bošnjak, Z.: Discovering Interesting Association Rules in the Web Log Usage Data. Interdisciplinary Journal of Information, Knowledge, and Management 5, 191–207 (2010)

    Google Scholar 

  21. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding Interesting Rules from Large Set of Discovered Association Rules. In: Proceedings of the Third International Conference on Information and Knowledge Management (CIKM 1994), pp. 401–407 (1994)

    Google Scholar 

  22. Perng, C., Wang, H., Ma, S., Hellerstein, J.: Discovery in Multi-attribute Data with User-defined Constraints. ACM SIGKDD Explorations Newsletter 4(1), 56–64 (2002)

    Article  Google Scholar 

  23. Chithra, R., Nicklas, S.: A Novel Algorithm for Minng Hybrid-Dimensional Association Rules. International Journal of Computer Applications 1(16), 53–58 (2010)

    Article  Google Scholar 

  24. Suncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  25. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)

    Google Scholar 

  26. Statlog (Landsat Satellite) Data Set, http://archive.ics.uci.edu/ml/datasets/Statlog+%28Landsat+Satellite%29

  27. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous valued attributes for classification learning. In: The Proceedings of Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sug, H. (2011). An Effective Method to Find Better Data Mining Model Using Inferior Class Oversampling. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Communications in Computer and Information Science, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24106-2_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24106-2_73

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24105-5

  • Online ISBN: 978-3-642-24106-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics