Abstract
This paper analyzes the effect of the high-dimensional, low-sample size problem in cancer classification using gene-expression microarrays. Here the two key questions addressed are: (i) What is the percentage of genes that can ensure highly accurate classification?, and (ii) Does this percentage differ from one classifier to another? Both these issues are investigated by developing a pool of experiments with two gene ranking algorithms, five classifiers and four DNA microarray databases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
Bolón-Canedo, V., Morán-Fernández, L., Alonso-Betanzos, A.: An insight on complexity measures and classification in microarray data. In: Proceedings of International Joint Conference on Neural Networks, Killarney, Ireland, pp. 1–8 (2015)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)
Dougherty, E.R.: Small sample issues for microarray-based classification. Comp. Funct. Genomics 2(1), 28–34 (2001)
García, V., Sánchez, J.S.: Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform. Sci. 294, 362–375 (2015)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Heller, M.J.: DNA microarray technology: devices, systems, and applications. Annu. Rev. Biomed. Eng. 4, 129–153 (2002)
Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 1–13 (2015). ID: 198363
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Huang, L., Zhang, H.H., Zeng, Z.B., Bushel, P.R.: Improved sparse multi-class SVM and its application for gene selection in cancer classification. Cancer Inform. 12, 143–153 (2013)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_57
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE-ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)
Lu, Y., Han, J.: Cancer classification using gene expression data. Inf. Syst. 28(4), 243–268 (2003)
Raspe, E., Decraene, C., Berx, G.: Gene expression profiling to dissect the complexity of cancer biology: pitfalls and promise. Semin. Cancer Biol. 22(3), 250–260 (2012)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Simon, R.: Analysis of DNA microarray expression data. Best Pract. Res. Clin. Haematol. 22(2), 271–282 (2009)
Wang, L., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE-ACM Trans. Comput. Biol. Bioinform. 4(1), 40–53 (2007)
Zhang, C., Lu, X., Zhang, X.: Significance of gene ranking for classification of microarray samples. IEEE-ACM Trans. Comput. Biol. Bioinform. 3(3), 312–320 (2006)
Acknowledgment
This work has partially been supported by the Spanish Ministry of Economy [TIN2013-46522-P], the Mexican PRODEP [DSA/103.5/15/7004], and the Generalitat Valenciana [PROMETEOII/2014/062].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
García, V., Sánchez, J.S., Cleofas-Sánchez, L., Ochoa-Domínguez, H.J., López-Orozco, F. (2017). An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-58838-4_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58837-7
Online ISBN: 978-3-319-58838-4
eBook Packages: Computer ScienceComputer Science (R0)