skip to main content
10.1145/2857546.2857643acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification

Authors Info & Claims
Published:04 January 2016Publication History

ABSTRACT

The class imbalance problem is frequently found in many real-world domains, where many of traditional classifiers often fail to detect minority class objects due to paying less attention to those. In an effort to address this class imbalance problem, a new under-sampling technique GAUS (genetic algorithm based under-sampling) is proposed in this paper. GAUS is designed to overcome several limitations of existing methods such as performance instability and information loss of data distribution. To select informative majority objects, GAUS tries to maximize the performance of a prototype classifier such that the prototypes minimize the loss between distributions of original and undersampled majority objects. We confirmed the effectiveness of the proposed GAUS based on real-world datasets.

References

  1. Tomczak, J. M., and Zięa, M. 2015. Probabilistic combination of classification rules and its application to medical diagnosis. Machine Learning, 101, 1--3, 105--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sahin, Y., Bulkan, S., and Duman, E. 2013. A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications, 40, 15, 5916--5923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wang, J., Fu, W., Lu, H., and Ma, S. 2014. Bilayer Sparse Topic Model for Scene Analysis in Imbalanced Surveillance Videos. Image Processing, IEEE Transactions on, 23, 12, 5198--5208.Google ScholarGoogle Scholar
  4. Murphey, Y. L., Chen, Z. H., and Feldkamp, L. A. 2008. An incremental neural learning framework and its application to vehicle diagnostics. Applied Intelligence, 28, 1, 29--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. He, H., and Garcia, E. 2009. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21, 9, 1263--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Garcı, S., Triguero, I., Carmona, C. J., and Herrera, F. 2012. Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 25, 1, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., and Herrera, F. 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42, 4, 463--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lee, J. S., and Zhu, D. 2011. When Costs Are Unequal and Unknown: A Subtree Grafting Approach for Unbalanced Data Classification*. Decision Sciences, 42, 4, 803--829.Google ScholarGoogle ScholarCross RefCross Ref
  9. Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap, C. 2012. DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, 36, 3, 664--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Batista, G. E., Prati, R. C., and Monard, M. C. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6, 1, 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mani, I., and Zhang, I. 2003. kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of the 20th International Conference on Machine Learning Workshop on Learning from imbalanced Data Sets (Washington, USA., August 21-24, 2003)Google ScholarGoogle Scholar
  12. Prati, R. C., Batista, G. E., and Monard, M. C. 2009. Data mining with imbalanced class distributions: concepts and methods. In Proceedings of the 4th Indian International Conference on Artificial Intelligence, (Tumkur, Karnataka, India, December 16--18, 2009). 359--376.Google ScholarGoogle Scholar
  13. Cateni, S., Colla, V., and Vannucci, M. 2014. A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing, 135, 32--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bradley, A. P. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30, 7, 1145--1159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., and Herrera, F. 2010. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17. 2--3, 255--287.Google ScholarGoogle Scholar

Index Terms

  1. A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
      January 2016
      658 pages
      ISBN:9781450341424
      DOI:10.1145/2857546

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 January 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate213of621submissions,34%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader