Mining patterns in disease classification forests

https://doi.org/10.1016/j.jbi.2010.06.004Get rights and content
Under an Elsevier user license
open archive

Abstract

Multiple biological pathways often work together to determine a given disease phenotype. Understanding what these pathways are and how they cooperate in disease-relevant biological processes is critical to our understanding of diseases. Using microarray gene expression data, researchers have developed several methods to rank pathways by their disease relevance. However, the exact set of pathways involved and how they are involved under given disease conditions remain unclear. In this paper, we propose a novel method to first select a robust set of pathways that together best classify a given disease, and then investigate how genes in these pathways interact to determine the phenotype. By applying our method to several disease related microarray gene expression datasets, we detected many disease-relevant interaction patterns supported by evidence from the literature. Our algorithm also achieves higher accuracy in terms of identification of a robust set of disease-relevant pathways when compared with alternative strategies.

Keywords

Biological pathway
Pattern mining
Random forests
Disease phenotype

Cited by (0)