Abstract
This paper presents a new method of deriving features for sample classification based on massive throughput data such as microarray gene expression studies. The number of features in these studies is much bigger than the number of samples thus strong reduction of dimensionality is essential. Standard approaches attempt to select subsets of features (genes) realizing highest association with the target, and they tend to produce unstable and non-reproducible feature sets. The purpose of this work is to improve feature selection by using prior biological knowledge of potential relationships between features, available e.g., in signaling pathways databases. We first identify most activated pathways and then derive pathway-based features based on expression of the up- and down-regulated genes in the pathway. We demonstrate performance of this approach using real microarray data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chiaretti, S., Li, X., Gentleman, R., et al.: Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survical. Blood 103, 2771–2778 (2004)
Dudoit, S., Fridlyand, J., Speed, P.: Comparison of discriminant methods for classification of tumors using gene expression data. JASA 192, 77–87 (2005)
Efron, B., Tibshirani, R.: On testing the significance of sets of genes. Ann. Appl. Stat. 1(1), 107–129 (2007)
Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103(15), 5923–5928 (2006)
Goemann, J.J., et al.: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20(1), 93–99 (2004)
Goeman, J.J., Buehlmann, P.: Analyzing gene expression data in terms on gene sets: methodological issues. Bioinformatics 23(8), 980–987 (2007)
Lin, Y.H., et al.: Multiple gene expression classifiers from different array platforms predict poor prognosis of colorectal cancer. Clin. Cancer Res. 13, 498–507 (2007)
Maciejewski, H.: Quality of Feature Selection Based on Microarray Gene Expression Data. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part III. LNCS, vol. 5103, pp. 140–147. Springer, Heidelberg (2008)
Maciejewski, H., Twaróg, P.: Model Instability in Microarray Gene Expression Class Prediction Studies. In: Moreno-DÃaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2009. LNCS, vol. 5717, pp. 745–752. Springer, Heidelberg (2009)
Maciejewski, H.: Class Prediction in Microarray Studies Based on Activation of Pathways. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS (LNAI), vol. 6678, pp. 321–328. Springer, Heidelberg (2011)
Maciejewski, H.: Competitive and self-contained gene set analysis methods applied for class prediction. In: Proc. of the Federated Conference on Computer Science and Information Systems. IEEE Computer Society Press (2011)
Markowetz, F., Spang, R.: Molecular diagnosis. Classification, Model Selection and Performance Evaluation, Methods Inf. Med. 44, 438–443 (2005)
Subramanian, A., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102(43), 15545–15550 (2005)
Wu, M.C., Lin, X.: Prior biological knowledge-based approaches for the analysi of genome-wide expression profiling using gene sets and pathways. Statistical Methods in Medical Research 18, 577–593 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maciejewski, H. (2012). Feature Selection Based on Activation of Signaling Pathways Applied for Classification of Samples in Microarray Studies. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-29350-4_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29349-8
Online ISBN: 978-3-642-29350-4
eBook Packages: Computer ScienceComputer Science (R0)