Research articleAn elastic network model to identify characteristic stress response genes
Introduction
The state of a biological cell is well approximated by a genome-wide gene expression profile—frequently measured by microarrays. A search in the literature reveals thousands of gene expression profiling experiments in recent years (Barrett et al., 2009). The popularity is due to the efficiency of the technology in terms of the number of measurements per experiment and the cost per measurement. It is common practice to use gene expression profiling for comparing the states of two cell populations that have been exposed to different conditions. This type of experiment is used to delineate differentially expressed probesets, genes or pathways. The result allows speculation about properties of the different conditions, reasoning about the functions of differentially expressed genes, or postulation of biomarkers as predictors of the applied conditions. A straightforward approach to discover differentially expressed genes is the use of p-values from a t-test (or ANOVA if more than two conditions are to be compared) as the criterion. Several advancements upon this basic approach have been developed, e.g. (Kadota et al., 2008). Performing such an analysis for full genome expression data requires correcting the p-values for multiple testing. Common approaches use the Bonferroni (Gordon et al., 2007) or Benjamini Hochberg FDR correction (Benjamini and Hochberg, 1995, Storey and Tibshirani, 2003). For many biological conditions under examination, the level of change to the transcriptome is quite minute, making it difficult to detect differentially expressed genes at all, or, if tightly coregulated genes alter their expression level, the response is very broad (Khil and Camerini-Otero, 2002, Luscombe et al., 2004), involving hundreds of genes. In particular, the situation with large numbers of differentially expressed genes is difficult to interpret. Often the affected genes are spread over many different pathways or biological processes. While it is possible to identify pathways that are connected to the biological question studied, many other pathways are not directly related. The obvious challenge is to generate new hypotheses from this type of result.
A gene-wise or probeset-wise analysis of expression data has the drawback that it does not take into account the coregulation structure of gene expression data. The coregulation structure is a fundamental property of biological networks (Stuart et al., 2003) and e.g. caused by a set of transcription factors binding sites that is shared among coregulated genes (Brivanlou and Darnell, 2002, Blais and Dynlacht, 2005). To improve statistical inference from microarray studies, the complex dependence structure between genes can be embraced. If several genes show differential gene expression and belong to the same biological theme, e.g. metabolic pathway, weak statistical signals of single genes sum up to more solid evidence on the level of gene sets (Subramanian et al., 2005). That method, commonly referred to as gene set enrichment analysis, relies on gene lists ranked according to p-values from a t-test statistic from gene set comparisons. In practice, these approaches have proven to be very powerful with the clear disadvantage of requiring prior knowledge on sets of related genes.
If the response of a cell to external stress is measured by gene expression profiling, often a large fraction of genes shows differential expression, despite a distinct and localised mode of toxicity. The grade of coherent response between two different genes is governed by their degree of coregulated in the cellular stress reaction program. This article regards gene expression in close analogy to linear network theory. Genes are represented as nodes of a weighted network. The links between genes are weighed by their pairwise coexpression if stress is put on some nodes. A change in environmental conditions thus leads to an adjustment of the expression levels of many genes. The expression profile x is modeled aswith coexpression matrix A and a perturbation vector u. A detailed derivation of the model can be found in Appendix A. Consequently, searching for good biomarkers which appropriately characterize the response must exploit detailed knowledge of the interaction structure of genes. It is well known that this complex coregulation can be inferred from multiple measurements by calculating the coexpression matrix (Gardner et al., 2003). Experimentally inferring all entries of A requires an unrealistic number of experiments. It is known that a cell’s genome-wide stress response is caused by a small number of stress response modes (López-Maury et al., 2008). In eukaryotes such modes may be e.g. related to anti-oxidation, heat shock, or energy generation (Girardot et al., 2004). Each mode is represented by several coexpressed genes. In order to characterize the stress response of a cell it is thus sufficient to reconstruct only those few modes. In this paper, we develop a method which focuses on the identification of these stress response modes.
More specifically, given two sets of cells (controls and stressed cells) the expression value of a probeset from stressed cells is modeled as the sum of a characteristic expression value in controls, a shift component induced by applying stress and a noise component .The standard deviation of the noise component is denoted by . The noise component, though a mixture of technical and biological noise, is solely interpreted as biological. As the relative influence of technical noise decreases with increasing expression value it is advisable to restrict analysis to highly expressed probesets. Using Eq. (1) gives a linear model for the global stress response which is dominated by a small number of response modes. Given the validity of the linear model and due to the coexpression among the genes within a response mode, the shift component can be represented by few probesets. These representative probesets are identified by performing a forward stagewise feature selection procedure with probesets ranked by least-angle regression (LARS) (Efron et al., 2004) according to their importance for accurately modeling the characteristic shift component. This results in a vector u with the minimum number of nonzero entries required for appropriate modeling. The linear model is limited in that it does not cover on/off regulation of pathways or feedback loops that may sharply amplify the expression of pathways. Therefore, probesets whose expression value significantly deviates from the linear model indicate changes in the coregulation structure of the respective genes induced by non-generic stress response mechanisms. These outlier probesets and those having a nonzero coefficient in the perturbation vector u completely describe the cellular response. The model provides a background allowing the quantitative separation of the experimental, genome-wide data into the contributions of the generic core response mechanisms and the contributions of compound-specific, local or non-linear atypical stress response mechanisms (López-Maury et al., 2008).
In toxicology, it is common praxis to assess the carcinogenicity of a chemical compound with a 2-year rodent bioassay. As a shortcut, it is envisioned to use gene expression profiling after as little as 2 weeks of experimentation. It is assumed that an early cellular response is predictive for the toxicological outcome. The toxicity classes under consideration, here genotoxic carcinogen and non-genotoxic carcinogen, display similar, genome-wide stress responses. Hence it is important to develop a method that can delineate specific and general aspects of stress responses. As outlined above, a linear elastic network model allows to identify markers for general and specific stress responses. We used this approach to analyze data from a toxicogenomic experiment (Ellinger-Ziegelbauer et al., 2008, Ellinger-Ziegelbauer et al., 2005). A test set of 138 Affymetrix GeneChip experiments (13 different training compounds, replicas and controls) was used to develop the method and to train the model. An independent validation set of 189 experiments (16 validation compounds) was used to validate the accuracy of the approach.
The test set was used to derive a network model for linear stress response for each toxicity class. Each model was defined by a small set of probesets , called attractors, which had a nonzero coefficient in u. Some probesets deviated from the prediction and are denoted as outlier genes . Since and completely describe the observed cellular response to stress, they are, together, denoted as biomarkers. The biomarkers were analyzed with respect to the biological role of their genes. In addition, they were evaluated as classifiers for each toxicity class using an artificial neural network. The performance of the classifier was assessed using the validation data set. Fig. 1 presents a flow chart of experimental and computational steps.
Section snippets
Expression profiles induced by genotoxic and non-genotoxic carcinogens in rat liver
In this study, a methodology was developed to analyze gene expression response upon exposure to stress. The experimental data was previously generated with the ambition to identify biomarkers to classify the carcinogenic potential of chemical compounds (Ellinger-Ziegelbauer et al., 2008). Three different classes of xenobiotic compounds – genotoxic carcinogens, non-genotoxic carcinogens, and non-hepatocarcinogens – were analyzed. Male Wistar rats were treated for up to 14 days with selected
Results
Attractor and outlier probesets were estimated for each class of compounds using a linear elastic response model of gene expression as described in the methods section. The discrepancy between the measured gene expression and the network model, measured by its residual sum of squares (RSS) value, was calculated with respect to the allowed number of attractor genes n. Fig. 2 shows an example of the relationship between the model quality and number of genes. We chose a cut off for the number of
Discussion
Our work was based on expression profiles of stressed and unstressed liver tissue. From gene expression data a linear elastic network model was calculated and biomarkers were derived. The validation of the elastic response network model is twofold: (A) The biological role of the biomarkers is examined, in particular in comparison with a-priori expectations for the different compound classes. (B) Biomarkers are used to build classifiers of toxicity classes. The classifier is validated on an
Conclusion
Here a new method to identify biomarkers for specific stress states was developed. This method is based on describing global gene expression in terms of a linear network model. Validating this approach was two fold. It is claimed that the set of genes that define the network model, , and the outliers to the model, , in conjunction define the response of the cell. If this is true, it shall be expected that the set of genes is biologically meaningful for the type of stress exerted. It is
Acknowledgements
Dr. Thomas Mrziglod supported the authors in the use of artificial neural networks. John Vitovsky kindly proof-read the manuscript.
We acknowledge financial support by the Bundesministerium für Bildung und Forschung (BMBF) for funding parts of this work through the HepatoSys network.
References (35)
- et al.
On a class of efficient learning algorithms for neural networks
Neural Netw.
(1992) - et al.
The enzymatic inactivation of the fatty acid amide class of signaling lipids
Chem. Phys. Lipids
(2002) - et al.
Prediction of a carcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis of short-term in vivo studies
Mutat. Res.
(2008) - et al.
Comparison of the expression profiles induced by genotoxic and nongenotoxic carcinogens in rat liver
Mutat. Res.
(2005) - et al.
Gene ontology: tool for the unification of biology. the gene ontology consortium
Nat. Genet.
(2000) - Bärmann, F., 2009. Nn-tool....
- et al.
Ncbi geo: archive for high-throughput functional genomic data
Nucleic Acids Res.
(2009) - et al.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Stat. Soc. B: Methodol.
(1995) - et al.
Constructing transcriptional regulatory networks
Genes Dev.
(2005) - et al.
Protection against aflatoxin b1-induced cytotoxicity by expression of the cloned aflatoxin b1-aldehyde reductases rat akr7a1 and human akr7a3
Chem. Res. Toxicol.
(2008)
Signal transduction and the control of gene expression
Science
Least angle regression
Ann. Stat.
Chemical mutagenesis testing in drosophila. ix. Results of 50 coded compounds tested for the national toxicology program
Environ. Mol. Mutagen.
Multivariate gene expression analysis reveals functional connectivity changes between normal/tumoral prostates
BMC Syst. Biol.
Inferring genetic networks and identifying compound mode of action via expression profiling
Science
Genome wide analysis of common and specific stress responses in adult drosophila melanogaster
BMC Genomics
Control of the mean number of false discoveries, bonferroni and stability of multiple testing
Ann. Appl. Stat.
Cited by (1)
- 1
These authors contributed equally to this work.