Abstract
The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Berns A (2000) Cancer: gene expression diagnosis. Nature 403:491–492
Dupuy A, Simon RM (2007) Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99:147–157
Zheng C-H, Chong Y-W, Wang H-Q (2011) Gene selection using independent variable group analysis for tumor classification. Neural Comput Appl 20:161–170. doi:10.1007/s00521-010-0513-2
Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10:789–799
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36:D480–D484
Ertel A, Verghese A, Byers SW, Ochs M, Tozeren A (2006) Pathway-specific differences between tumor cell lines and normal and tumor tissue cells. Mol Cancer 5:55
Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics 6:58. doi:10.1186/1471-2105-6-58
Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi M-B, Harpole D, Lancaster JM, Berchuck A, Olson JA Jr, Marks JR, Dressman HK, West M, Nevins JR (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353–357
Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11):e1000217. doi:10.1371/journal.pcbi.1000217
Kim K-J, Cho S-B (2006) Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 70:187–199
Sootanan P, Prom-on S, Meechai A, Chan JH (2010) Microarray-based disease classification using pathway activities with negatively correlated feature sets. In: Wong KW, Mendis BSU, Bouzerdoum A (eds) 17th international conference on neural information processing, (ICONIP 2010), part II, vol 6444. LNCS, Sydney, pp 250–258
Pawitan Y, Bjöhle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner A, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6):R953–R964
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EMJJ, Atkins D, Foekens JA (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671–679
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18. doi:10.1145/1656274.1656278
Liao JG, Chin KV (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951
Helman P, Veroff R, Atlas SR, Willman C (2004) A Bayesian network classification methodology for gene expression data. J Comput Biol 11(4):581–615
Ringnér M, Peterson C (2003) Microarray-based cancer diagnosis with artificial neural networks. BioTechniques 34:S30–S35
McDonald JH (2009) Handbook of biological statistics, 2nd edn edn. Sparky House Publishing, Baltimore, pp 198–201
Esteban LM, Sanz G, López FJ, Borque Á, Vergara JM (2006) Logistic regression versus neural networks for medical data. Monografias del Seminario Matemático García de Galdeano 33:245–252
Stewart B (1998) Improving performance of naïve Bayes classifiers by including hidden-variables. In: Mira J, Del Pobil AP (eds) Methodology and tools in knowledge-based systems, 11th international conference on industrial and engineering applications of artificial intelligence and expert systems, IEA/AIE-98, vol I. Lecture Notes in Computer Science, vol 1415, Springer, Berlin, pp 272–280
Pirooznia M, Yang JY, Yang MQ, Deng Y (2008) A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9(Suppl 1):S13
Acknowledgments
The main author (PS) gratefully acknowledges the financial support from National Research Council of Thailand, School of Information Technology, King Mongkut’s University of Technology Thonburi, as well as Burapha University during his current doctorate study at King Mongkut’s University of Technology Thonburi. PS is especially thankful to Mr. Ponlavit Larpeampaisarl, who helped to implement the script in the work of PCOG identification and activity inference.
Conflict of interests
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sootanan, P., Prom-on, S., Meechai, A. et al. Pathway-based microarray analysis for robust disease classification. Neural Comput & Applic 21, 649–660 (2012). https://doi.org/10.1007/s00521-011-0662-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-011-0662-y