Probability fold change: A robust computational approach for identifying differentially expressed gene lists
Introduction
DNA microarrays allow quick identification of differentially expressed genes (DEGs) that may be used to develop potential biomarkers, elucidate molecular mechanisms of diseases and group similar samples based on gene signatures [1], [2], [3], [4], [5], [6], [7], [8], [9]. Such differentially expressed genes provide a basis for understanding underlying biological questions and serve as a starting point for many follow-up analyses such as predictive cancer classification [10], [11], [12], gene ontology analysis [13] and pathway analysis [14], [15]. However, different ranking algorithms generate very different gene lists, and this could profoundly impact follow-up analyses and biological interpretation [16], [17]. Therefore, evaluating current ranking methods and developing improved ranking algorithms are critical in microarray data analysis.
Extraordinary concerns have been raised regarding the consistency and reliability of using DNA microarrays [18], [19], [20]. Recent studies including the unprecedented biggest MicroArray Quality Control (MAQC) project involving multiple platforms and labs have demonstrated that data analysis methods have a great impact on the consistency and reproducibility in identifying the DEG lists [16], [21], [22], [23], [24], [25]. Although the reproducibility of DEGs is influenced by many experimental factors such as sample preparation and hybridization, etc., one of the most important factors leading to the inconsistency of DEGs is the bioinformatics tools and algorithms used for data analysis [24], [25]. It is found that gene lists ranked using fold change are much more reproducible than those using t-statistic based ranking methods [16], [24], [25]. These studies strongly demonstrate that certain controllable factors rather than microarray itself are largely responsible for producing spurious results [16], [24], [25].
Generally speaking, a gene-ranking algorithm should be evaluated in both biological reproducibility and statistical accuracy. A good ranking algorithm should show high reproducibility in terms of the concordance of DEG list across different labs and microarray platforms, and the robustness at different sample sizes and experimental errors. However, because the reproducibility itself cannot tell us whether the DEG list is the true one or not, one should also look into the “accuracy”, which is interrogated and reflected by a set of inter-related statistical measurements including false discovery rates (FDR) and false negative rates (FNR), statistical power, Type 1 error, receiver operating characteristic (ROC), sensitivity, and specificity, etc. Note that a ranking algorithm which shows high reproducibility does not imply that it possesses the quality of high accuracy and vice versa. Therefore, the reproducibility and accuracy are both critical quality metrics that should be considered together. Despite numerous comparative studies [16], [17], [25], [26], these quality aspects of different ranking algorithms are not completely understood.
One difficulty in evaluating gene-ranking methods is the lack of validation data sets to evaluate the reproducibility and accuracy of different ranking algorithms. Thanks to the MAQC project [16], [27], we were able to test the reproducibility of ranking methods on multiple microarray and quantitative real-time RT-PCR (qRT-PCR) platforms. In MAQC study, microarray data were generated using multiple microarray platforms from many different labs using a standard set of reference RNA samples. Potential confounding experimental factors such as sample preparation, hybridization, and data preprocessing were well controlled. A significant number (∼1000) of genes were also measured by alternative qRT-PCR. Furthermore, we tested the accuracy of gene-ranking algorithms using an Affymetrix Latin square spike-in data [28] in which a small percentage of genes were spiked in as true DEGs with different concentrations.
Considerable research effort has been devoted to the development of the gene-ranking algorithms for microarray data analysis [29], [30]. Recognizing the strength and weakness of the popular methods, we developed a new ranking algorithm, the probabilistic fold change (PFC) from the perspective of microarray practitioners. PFC uses the confidence interval estimates instead of traditional expression fold change, which is a point estimate, to improve the confidence level of gene ranking. From the Bayesian perspective, the ranking statistic of PFC is a random distribution and the genes were ranked according to the cumulative probability. The mechanism of PFC is illustrated in Fig. 1 and will be described later. The reproducibility of gene-ranking methods was not only measured by the concordance of DEG lists across labs, platforms and qRT-PCR, but also by the resistance to experimental errors and small sample sizes. The accuracy was tested by FDR, statistical power of a simulated data set based on MAQC data, and the sensitivity and specificity of a Latin square spike-in data set. We examined the key factors that affect reproducibility and accuracy including sample size, proportion of DEGs, random perturbation, and different microarray platforms. Our results indicated that PFC performs better than other popular ranking algorithms including SAM [31], mean fold change (FC), t-statistic (T), Bayesian t-statistic (BAYT) [32], intensity-conditioned fold change (CFC) [33], [34], and rank product (RP) [35].
Section snippets
MAQC_Rat data
This data set [16] was derived from 36 rats with six biological replicates in each group (tissue/treatment): liver/aristolochic acid, kidney/aristolochic acid, liver/riddelliine, liver/comfrey, liver/control and kidney/control. The samples were aliquoted and distributed to five different labs using four microarray platforms (ABI: Applied Biosystems; AFX: Affymetrix; AG1: Agilent; GEH: GE Healthcare), with two labs using the Affymetrix microarray, notated as AFX and AF2. The data set contains
Calculation and results
We applied three data sets in our comparative study—MAQC_Rat, MAQC_Human and Latin square spike-in data (See Section 2 for detail). The MAQC_Rat data set is our main test bed because it contains multiple biological replicate samples under different treatments, a real case in microarray experiments. The MAQC_Human data set is used to validate our study as it contains a large number (∼1000) of qRT-PCR data. The Latin square spike-in data set allows us to directly compute FDR and FNR for each
Discussion
In microarray community, FC and T have been the most popular ranking algorithms primarily due to their simplicity. Despite its high reproducibility, FC ranking assumes an unrealistic constant variation across all transcripts. In FC ranking, genes with low fold change will always be ranked low in the list. As a result, FC may lose statistical power by failing to identify some important DEGs. The simulation spike-in study in previous section may help us understand this. Fig. 6 shows the fc–se
Conclusions
We developed a new gene-ranking algorithm called PFC for identifying differentially expressed genes. Seven popular gene-ranking algorithms were evaluated based on their reproducibly and accuracy in generating DEG lists. The effect of relevant source of variances, such as sample size, different platforms, and random errors were examined. In our comparative study, T statistic showed moderate accuracy but very poor reproducibility. The use of T statistic should be avoided in current microarray
Acknowledgements
The authors would like to thank Mr. Terrance M. Doherty for his review of the manuscript.
References (58)
- et al.
Transcriptomic fingerprinting of bone marrow-derived hepatic beta2m-/Thy-1+ stem cells
Biochem. Biophys. Res. Commun.
(2005) - et al.
Link test—a statistical method for finding prostate cancer biomarkers
Comput. Biol. Chem.
(2006) - et al.
Cancer classification and prediction using logistic regression with Bayesian gene selection
J. Biomed. Inform.
(2004) - et al.
In vivo transcriptional profile analysis reveals RNA splicing and chromatin remodeling as prominent processes for adult neurogenesis
Mol. Cell Neurosci.
(2006) A mixture model approach for the analysis of microarray gene expression data
Comput. Stat. Data Anal.
(2002)- et al.
Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments
FEBS Lett.
(2004) - et al.
Use of real-time quantitative PCR to validate the results of cDNA array and differential display PCR technologies
Methods
(2001) On the identification of differentially expressed genes: improving the generalized F-statistics for Affymetrix microarray gene expression data
Comput. Biol. Chem.
(2006)- et al.
Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays
Science
(2002) - et al.
DNA microarrays in drug discovery and development
Nat. Genet.
(1999)
Discovery and analysis of inflammatory disease-related genes using cDNA microarrays
Proc. Natl. Acad. Sci. U.S.A.
Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs
Nature
DNA microarrays and beyond: completing the journey from tissue to cell
Nat. Cell Biol.
Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
Nucleic Acids Res.
Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models
Comput. Biol. Chem.
Cross-platform analysis of cancer biomarkers: a Bayesian network approach to incorporating mass spectrometry and microarray data
Cancer Inform.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
Analysis of the eye developmental pathway in Drosophila using DNA microarrays
Proc. Natl. Acad. Sci. U.S.A.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
Proc. Natl. Acad. Sci. U.S.A.
Rat toxicogenomic study reveals analytical consistency across microarray platforms
Nat. Biotechnol.
Empirical evaluation of data transformations and ranking statistics for microarray analysis
Nucleic Acids Res.
Getting the noise out of gene arrays
Science
Microarray reality checks in the context of a complex disease
Nat. Biotechnol.
Evaluation of gene expression measurements from commercial microarray platforms
Nucleic Acids Res.
Standardizing global gene expression analysis between laboratories and across platforms
Nat. Meth.
Multiple-laboratory comparison of microarray platforms
Nat. Meth.
Independence and reproducibility across microarray platforms
Nat. Meth.
Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential, BMC
Bioinformatics
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements
Nat. Biotechnol.
Cited by (7)
CDS: A fold-change based statistical test for concomitant identification of distinctness and similarity in gene expression analysis
2012, Genomics, Proteomics and BioinformaticsCitation Excerpt :When comparing different methods for differential expression detection, among the desirable characteristics that a method should have are reproducibility and control of type I and type II errors. Not all of the existing methods necessarily combine both characteristics [14]. Another way of comparing different methods is to measure their false positive and false negative rates [15].
Analysis of microarray data using artificial intelligence based techniques
2016, Handbook of Research on Computational Intelligence Applications in BioinformaticsOn the sensitivity of feature ranked lists for large-scale biological data
2013, Mathematical Biosciences and EngineeringProbabilistic strain optimization under constraint uncertainty
2013, BMC Systems BiologyGene modification identification under flux capacity uncertainty
2013, Proceedings - Design Automation Conference