METRADISC-XL: A program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays

https://doi.org/10.1016/j.cmpb.2012.08.001Get rights and content

Abstract

A comprehensive software for performing meta-analysis of ranked discovery oriented datasets, such as those derived from microarrays or other high throughput technologies, and for testing between-study heterogeneity for biological variables (gene expression, microRNA, proteomic, or other high-dimensional data) is presented. The software can identify biological probes that have either very high average ranks (e.g. consistently over-expressed genes) or very low average ranks (e.g. consistently under-expressed genes). The program tests each probe's average rank and the between-study heterogeneity of the study-specific ranks. Furthermore, it performs heterogeneity analyses restricted to probes with similar average ranks. The program allows both unweighted and weighted analysis. Statistical inferences are based on Monte Carlo permutation tests.

Introduction

The increasing availability of inexpensive high-throughput experiments is generating large-scale datasets oriented towards the discovery of biological signals and the understanding of biological patterns and associations [1]. The typical assays depend on microarray technology, but other high-throughput assays may also be used [2], [3], [4], [5], [6], [7], [8]. The biological variables measured by the assay probes may include gene expression, microRNA, proteins and peptides, metabolites, lipids, or other biological substrates of interest [3], [9], [10], [11]. However, inferences in most individual high-throughput studies are limited by small sample sizes and inconsistent results across studies. In addition, associations for each single biological variable are often rather weak regardless of what probes are used and what is measured. This situation creates a need to combine datasets across different studies to maximize power to detect genuine signals of association. A number of different methods and software have been developed for combining microarray and high-throughput data, each with relative strengths and limitations, as reviewed in [12]. As high-throughput databases become larger in the number of tested probes and often yield inconsistent results across different studies, there is ample room for developing improved publicly available software that would be more efficient in combining such datasets and also testing the extent of variability and inconsistency (heterogeneity) across studies.

Here we describe the further development of a user-friendly comprehensive software (called METRADISC-XL) for non-parametric meta-analysis of multidimensional ranked discovery datasets, such as those produced by microarray experiments, in order to identify consistently extremely ranked probes, i.e. probes with high ranking and small heterogeneity across studies. METRADISC-XL implements the METRADISC methodology which is described in details in reference [13]; the program used in original methodological paper [13] was customized for a specific example with a specific number of parameters without having the functionality of being expanded. METRADISC-XL can be used to synthesize data for a considerable large number of studies, probes and type of missingness pattern, and it can perform a limitless number of permutations.

Section snippets

Methods

The theoretical framework behind the combination of data from multiple studies and testing for between-study heterogeneity is explained in detail in the previous presentation of METRADISC methodology [13]. We present a brief synopsis here. In analysing a primary microarray or other high-throughput dataset, the final output is usually a P-value (false positive rate) that shows the significance of the difference between compared groups of interest, e.g. differential gene expression between

Implementation

The program is written in Compaq Visual Fortran Professional Edition 6.6.0, uses the IMSL library and runs under DOS. An executable file can be downloaded form http://biomath.med.uth.gr/metradisc/, and the Fortran code is available upon request from the corresponding author. The code is suitable for other operating systems, but the IMSL library must be accessible in order to compile it.

Operations

Data entry is from an ASCII file created by the user containing the ranks of the probes of each study and its information class (missed values are assigned as −99). The ranks can be derived using a statistical package like SAS or SPSS. The study weights are entered in a separate ASCII file. The data and weights files are stored in the same directory as the METRADISC-XL executable file, and named “data.txt” and “weights.txt”, respectively.

In executing the program there are two options: the main

Application

For demonstration, the program was applied to data from five studies that compared the gene expression between diffuse large B-cell lymphoma vs. normal tissues. The data were extracted from ONCOMINE (www.ONCOMINE.org, accessed February 2, 2012), a cancer microarray database and integrated data-mining platform [16]. The five studies included the following number of genes: study1 = 8587, study2 = 15049, study3 = 2630, study4 = 2826 and study5 = 19574. In total, the number of distinct genes was 20409

Discussion and conclusion

METRADISC-XL is a freely available comprehensive software for combining microarray and other high throughput studies that allows the identification of biological probes with significant and consistent high or low ranking, with the option for unweighted and weighted analysis, and the examination of heterogeneity across studies [12]. In addition to microarrays, the software can be used for combining complex data sets derived from massive testing, discovery-oriented research such as mass

Conflicts of interest

The authors declare that there is no conflict of interest.

References (18)

There are more references available in the full text version of this article.

Cited by (3)

  • Replication of chromosomal loci involved in Parkinson's disease: A quantitative synthesis of GWAS

    2021, Toxicology Reports
    Citation Excerpt :

    The smallest p-values were accredited the higher rank (120). Bins with no corresponding p-value were considered as missing values and attributed the code number “-99” to be recognized as such by the software [10,11]. When equal p-values were noted, we considered them as tied ranks and performed the mid-rank method i.e. they ranked by their median rank.

View full text