METRADISC-XL: A program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays
Introduction
The increasing availability of inexpensive high-throughput experiments is generating large-scale datasets oriented towards the discovery of biological signals and the understanding of biological patterns and associations [1]. The typical assays depend on microarray technology, but other high-throughput assays may also be used [2], [3], [4], [5], [6], [7], [8]. The biological variables measured by the assay probes may include gene expression, microRNA, proteins and peptides, metabolites, lipids, or other biological substrates of interest [3], [9], [10], [11]. However, inferences in most individual high-throughput studies are limited by small sample sizes and inconsistent results across studies. In addition, associations for each single biological variable are often rather weak regardless of what probes are used and what is measured. This situation creates a need to combine datasets across different studies to maximize power to detect genuine signals of association. A number of different methods and software have been developed for combining microarray and high-throughput data, each with relative strengths and limitations, as reviewed in [12]. As high-throughput databases become larger in the number of tested probes and often yield inconsistent results across different studies, there is ample room for developing improved publicly available software that would be more efficient in combining such datasets and also testing the extent of variability and inconsistency (heterogeneity) across studies.
Here we describe the further development of a user-friendly comprehensive software (called METRADISC-XL) for non-parametric meta-analysis of multidimensional ranked discovery datasets, such as those produced by microarray experiments, in order to identify consistently extremely ranked probes, i.e. probes with high ranking and small heterogeneity across studies. METRADISC-XL implements the METRADISC methodology which is described in details in reference [13]; the program used in original methodological paper [13] was customized for a specific example with a specific number of parameters without having the functionality of being expanded. METRADISC-XL can be used to synthesize data for a considerable large number of studies, probes and type of missingness pattern, and it can perform a limitless number of permutations.
Section snippets
Methods
The theoretical framework behind the combination of data from multiple studies and testing for between-study heterogeneity is explained in detail in the previous presentation of METRADISC methodology [13]. We present a brief synopsis here. In analysing a primary microarray or other high-throughput dataset, the final output is usually a P-value (false positive rate) that shows the significance of the difference between compared groups of interest, e.g. differential gene expression between
Implementation
The program is written in Compaq Visual Fortran Professional Edition 6.6.0, uses the IMSL library and runs under DOS. An executable file can be downloaded form http://biomath.med.uth.gr/metradisc/, and the Fortran code is available upon request from the corresponding author. The code is suitable for other operating systems, but the IMSL library must be accessible in order to compile it.
Operations
Data entry is from an ASCII file created by the user containing the ranks of the probes of each study and its information class (missed values are assigned as −99). The ranks can be derived using a statistical package like SAS or SPSS. The study weights are entered in a separate ASCII file. The data and weights files are stored in the same directory as the METRADISC-XL executable file, and named “data.txt” and “weights.txt”, respectively.
In executing the program there are two options: the main
Application
For demonstration, the program was applied to data from five studies that compared the gene expression between diffuse large B-cell lymphoma vs. normal tissues. The data were extracted from ONCOMINE (www.ONCOMINE.org, accessed February 2, 2012), a cancer microarray database and integrated data-mining platform [16]. The five studies included the following number of genes: study1 = 8587, study2 = 15049, study3 = 2630, study4 = 2826 and study5 = 19574. In total, the number of distinct genes was 20409
Discussion and conclusion
METRADISC-XL is a freely available comprehensive software for combining microarray and other high throughput studies that allows the identification of biological probes with significant and consistent high or low ranking, with the option for unweighted and weighted analysis, and the examination of heterogeneity across studies [12]. In addition to microarrays, the software can be used for combining complex data sets derived from massive testing, discovery-oriented research such as mass
Conflicts of interest
The authors declare that there is no conflict of interest.
References (18)
- et al.
Gene expression in Trypanosoma brucei: lessons from high-throughput RNA sequencing
Trends in Parasitology
(2011) - et al.
The edited transcriptome: novel high throughput approaches to detect nucleotide deamination
Current Opinion in Genetics and Development
(2011) - et al.
High throughput cellular screens to interrogate the human T and B cell repertoires
Current Opinion in Immunology
(2011) - et al.
ONCOMINE: a cancer microarray database and integrated data-mining platform
Neoplasia
(2004) - et al.
Forest classification trees and forest support vector machines algorithms: demonstration using microarray data
Computers in Biology and Medicine
(2010) - et al.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
(1999) - et al.
The emerging field of quantitative blood metabolomics for biomarker discovery in critical illnesses
American Journal of Respiratory and Critical Care Medicine
(2011) - et al.
On the future of “omics”: lipidomics
Journal of Inherited Metabolic Disease
(2011) - et al.
Ultrahigh-throughput screening in drop-based microfluidics for directed evolution
Proceedings of the National Academy of Sciences of the United States of America
(2010)
Cited by (3)
Replication of chromosomal loci involved in Parkinson's disease: A quantitative synthesis of GWAS
2021, Toxicology ReportsCitation Excerpt :The smallest p-values were accredited the higher rank (120). Bins with no corresponding p-value were considered as missing values and attributed the code number “-99” to be recognized as such by the software [10,11]. When equal p-values were noted, we considered them as tied ranks and performed the mid-rank method i.e. they ranked by their median rank.
Methods of analysis and meta-analysis for identifying differentially expressed genes
2018, Methods in Molecular Biology