Skip to main content

Two-Class with Oversampling Versus One-Class Classification for Microarray Datasets

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2016 (ICANN 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9887))

Included in the following conference series:

  • 3731 Accesses

Abstract

Microarray datasets are a challenge for classical computational techniques because of the large dimensionality of their feature space front to a reduced number of samples, besides they usually present unbalanced classes. Thanks to this unbalanced situation, in a previous research, the superiority of one-class classification for handling microarray datasets was proved. This paper presents a new study that tries to improve the behavior of the traditional techniques, specifically Support Vector Machines, by considering oversampling techniques. The experimental results achieved demonstrate that despite inclusion of these methods the performance of classical classifiers still remains below one-class approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krawczyk, B.: Combining one-class support vector machines for microarray classification. In: Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 83–89 (2013)

    Google Scholar 

  2. Pérez-Sánchez, B., Fontenla-Romero, O., Sánchez-Maroño, N.: One-class classification for microarray datasets with feature selection. In: Iliadis, L., Jayne, C. (eds.) EANN 2015. CCIS, vol. 517, pp. 325–334. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23983-5_30

    Chapter  Google Scholar 

  3. Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Nanni, L., Fantozzi, C., Lazzarini, N.: Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61 (2015)

    Article  Google Scholar 

  5. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man, Cybern. B, Cybern. SMC–2(3), 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  7. Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: 2nd International Workshop on Computer Science and Engineering (IWCSE 2009), vol. 2, pp. 13–17 (2009)

    Google Scholar 

  8. Li, Y., Maguire, L.: Selecting critical patterns based on local geometrical and statistical information. IEEE Trans. Pattern Anal. Mach. Intell 33(6), 1189–1201 (2011)

    Article  Google Scholar 

  9. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  10. Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54, 45–66 (2004)

    Article  MATH  Google Scholar 

  11. Tax, D.M.J.: DDtools, the data description toolbox for matlab, Delft University of Technology (2005)

    Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  13. Kent Ridge Bio-Medical Dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd. Accessed Feb 2016

  14. Microarray Cancers, Plymouth University. http://www.tech.plym.ac.uk/spmc/links/bioinformatics/microarray/microarray_cancers.html. Accessed Feb 2016

  15. Moreno-Torres, J.G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recogn. 45(1), 521–530 (2012)

    Article  Google Scholar 

  16. Hall, M.: Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis (1999)

    Google Scholar 

  17. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: 20th International Conference on Machine Learning (ICML 2003), pp. 856–863 (2003)

    Google Scholar 

  18. Zhao, Z., Liu, H.: Searching for interacting features. In: 20th International Joint Conference on Artifical Intelligence (IJCAI 2007), pp. 1156–1161 (2007)

    Google Scholar 

  19. Hall, M., Smith, L.: Practical feature subset selection for machine learning. In: 21st Australasian Computer Science Conference (ACSC 1998), pp. 181–191 (1998)

    Google Scholar 

  20. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML-94. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  21. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell 27, 1226–1238 (2005)

    Article  Google Scholar 

  22. Guyon, I., Weston, J., Barnhill, S., Vapnik, V., Cristianini, N.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been supported in part by the Secretaría de Estado de Investigación of the Spanish Government (Grant TIN2015-65069-C2-1-R), and by the Xunta de Galicia (Grant GRC2014/035) with the European Union FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beatriz Pérez-Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pérez-Sánchez, B., Fontenla-Romero, O., Sánchez-Maroño, N. (2016). Two-Class with Oversampling Versus One-Class Classification for Microarray Datasets. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9887. Springer, Cham. https://doi.org/10.1007/978-3-319-44781-0_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44781-0_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44780-3

  • Online ISBN: 978-3-319-44781-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics