Two-Class with Oversampling Versus One-Class Classification for Microarray Datasets

Pérez-Sánchez, Beatriz; Fontenla-Romero, Oscar; Sánchez-Maroño, Noelia

doi:10.1007/978-3-319-44781-0_47

Beatriz Pérez-Sánchez¹⁶,
Oscar Fontenla-Romero¹⁶ &
Noelia Sánchez-Maroño¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9887))

Included in the following conference series:

International Conference on Artificial Neural Networks

3731 Accesses

Abstract

Microarray datasets are a challenge for classical computational techniques because of the large dimensionality of their feature space front to a reduced number of samples, besides they usually present unbalanced classes. Thanks to this unbalanced situation, in a previous research, the superiority of one-class classification for handling microarray datasets was proved. This paper presents a new study that tries to improve the behavior of the traditional techniques, specifically Support Vector Machines, by considering oversampling techniques. The experimental results achieved demonstrate that despite inclusion of these methods the performance of classical classifiers still remains below one-class approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krawczyk, B.: Combining one-class support vector machines for microarray classification. In: Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 83–89 (2013)
Google Scholar
Pérez-Sánchez, B., Fontenla-Romero, O., Sánchez-Maroño, N.: One-class classification for microarray datasets with feature selection. In: Iliadis, L., Jayne, C. (eds.) EANN 2015. CCIS, vol. 517, pp. 325–334. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23983-5_30
Chapter Google Scholar
Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Chapter Google Scholar
Nanni, L., Fantozzi, C., Lazzarini, N.: Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61 (2015)
Article Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man, Cybern. B, Cybern. SMC–2(3), 408–421 (1972)
Article MathSciNet MATH Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: 2nd International Workshop on Computer Science and Engineering (IWCSE 2009), vol. 2, pp. 13–17 (2009)
Google Scholar
Li, Y., Maguire, L.: Selecting critical patterns based on local geometrical and statistical information. IEEE Trans. Pattern Anal. Mach. Intell 33(6), 1189–1201 (2011)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54, 45–66 (2004)
Article MATH Google Scholar
Tax, D.M.J.: DDtools, the data description toolbox for matlab, Delft University of Technology (2005)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Kent Ridge Bio-Medical Dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd. Accessed Feb 2016
Microarray Cancers, Plymouth University. http://www.tech.plym.ac.uk/spmc/links/bioinformatics/microarray/microarray_cancers.html. Accessed Feb 2016
Moreno-Torres, J.G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recogn. 45(1), 521–530 (2012)
Article Google Scholar
Hall, M.: Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis (1999)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: 20th International Conference on Machine Learning (ICML 2003), pp. 856–863 (2003)
Google Scholar
Zhao, Z., Liu, H.: Searching for interacting features. In: 20th International Joint Conference on Artifical Intelligence (IJCAI 2007), pp. 1156–1161 (2007)
Google Scholar
Hall, M., Smith, L.: Practical feature subset selection for machine learning. In: 21st Australasian Computer Science Conference (ACSC 1998), pp. 181–191 (1998)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML-94. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Chapter Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell 27, 1226–1238 (2005)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V., Cristianini, N.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar

Download references

Acknowledgments

This work has been supported in part by the Secretaría de Estado de Investigación of the Spanish Government (Grant TIN2015-65069-C2-1-R), and by the Xunta de Galicia (Grant GRC2014/035) with the European Union FEDER funds.

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Informatics, University of A Coruña, Campus de Elviña s/n, 15071, A Coruña, Spain
Beatriz Pérez-Sánchez, Oscar Fontenla-Romero & Noelia Sánchez-Maroño

Authors

Beatriz Pérez-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Fontenla-Romero
View author publications
You can also search for this author in PubMed Google Scholar
Noelia Sánchez-Maroño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beatriz Pérez-Sánchez .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa
University of Lausanne, Lausanne, Switzerland
Paolo Masulli
Universitat Politécnica de Catalunya, Terrrassa, Spain
Antonio Javier Pons Rivero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez-Sánchez, B., Fontenla-Romero, O., Sánchez-Maroño, N. (2016). Two-Class with Oversampling Versus One-Class Classification for Microarray Datasets. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9887. Springer, Cham. https://doi.org/10.1007/978-3-319-44781-0_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-44781-0_47
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44780-3
Online ISBN: 978-3-319-44781-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics