Research article
An elastic network model to identify characteristic stress response genes

https://doi.org/10.1016/j.compbiolchem.2010.06.004Get rights and content

Abstract

Exposing eukaryotic cells to a toxic compound and subsequent gene expression profiling may allow the prediction of selected toxic effects based on changes in gene expression. This objective is complicated by the observation that compounds with different modes of toxicity cause similar changes in gene expression and that a global stress response affects many genes. We developed an elastic network model of global stress response with nodes representing genes which are connected by edges of graded coexpression. The expression of only few genes have to be known to model the global stress response of all but a few atypical responder genes. Those required genes and the atypical response genes are shown to be good biomarker for tox predictions. In total, 138 experiments and 13 different compounds were used to train models for different toxicity classes. The deduced biomarkers were shown to be biologically plausible. A neural network was trained to predict the toxic effects of compounds from profiling experiments. On a validation data set of 189 experiments with 16 different compounds the accuracy of the predictions was assessed: 14 out of 16 compounds have been classified correctly. Derivation of model based biomarkers through the elastic network approach can naturally be extended to other areas beyond toxicology since subtle signals against a broad response background are common in biological studies.

Introduction

The state of a biological cell is well approximated by a genome-wide gene expression profile—frequently measured by microarrays. A search in the literature reveals thousands of gene expression profiling experiments in recent years (Barrett et al., 2009). The popularity is due to the efficiency of the technology in terms of the number of measurements per experiment and the cost per measurement. It is common practice to use gene expression profiling for comparing the states of two cell populations that have been exposed to different conditions. This type of experiment is used to delineate differentially expressed probesets, genes or pathways. The result allows speculation about properties of the different conditions, reasoning about the functions of differentially expressed genes, or postulation of biomarkers as predictors of the applied conditions. A straightforward approach to discover differentially expressed genes is the use of p-values from a t-test (or ANOVA if more than two conditions are to be compared) as the criterion. Several advancements upon this basic approach have been developed, e.g. (Kadota et al., 2008). Performing such an analysis for full genome expression data requires correcting the p-values for multiple testing. Common approaches use the Bonferroni (Gordon et al., 2007) or Benjamini Hochberg FDR correction (Benjamini and Hochberg, 1995, Storey and Tibshirani, 2003). For many biological conditions under examination, the level of change to the transcriptome is quite minute, making it difficult to detect differentially expressed genes at all, or, if tightly coregulated genes alter their expression level, the response is very broad (Khil and Camerini-Otero, 2002, Luscombe et al., 2004), involving hundreds of genes. In particular, the situation with large numbers of differentially expressed genes is difficult to interpret. Often the affected genes are spread over many different pathways or biological processes. While it is possible to identify pathways that are connected to the biological question studied, many other pathways are not directly related. The obvious challenge is to generate new hypotheses from this type of result.

A gene-wise or probeset-wise analysis of expression data has the drawback that it does not take into account the coregulation structure of gene expression data. The coregulation structure is a fundamental property of biological networks (Stuart et al., 2003) and e.g. caused by a set of transcription factors binding sites that is shared among coregulated genes (Brivanlou and Darnell, 2002, Blais and Dynlacht, 2005). To improve statistical inference from microarray studies, the complex dependence structure between genes can be embraced. If several genes show differential gene expression and belong to the same biological theme, e.g. metabolic pathway, weak statistical signals of single genes sum up to more solid evidence on the level of gene sets (Subramanian et al., 2005). That method, commonly referred to as gene set enrichment analysis, relies on gene lists ranked according to p-values from a t-test statistic from gene set comparisons. In practice, these approaches have proven to be very powerful with the clear disadvantage of requiring prior knowledge on sets of related genes.

If the response of a cell to external stress is measured by gene expression profiling, often a large fraction of genes shows differential expression, despite a distinct and localised mode of toxicity. The grade of coherent response between two different genes is governed by their degree of coregulated in the cellular stress reaction program. This article regards gene expression in close analogy to linear network theory. Genes are represented as nodes of a weighted network. The links between genes are weighed by their pairwise coexpression if stress is put on some nodes. A change in environmental conditions thus leads to an adjustment of the expression levels of many genes. The expression profile x is modeled asx=Auwith coexpression matrix A and a perturbation vector u. A detailed derivation of the model can be found in Appendix A. Consequently, searching for good biomarkers which appropriately characterize the response must exploit detailed knowledge of the interaction structure of genes. It is well known that this complex coregulation can be inferred from multiple measurements by calculating the coexpression matrix (Gardner et al., 2003). Experimentally inferring all entries of A requires an unrealistic number of experiments. It is known that a cell’s genome-wide stress response is caused by a small number of stress response modes (López-Maury et al., 2008). In eukaryotes such modes may be e.g. related to anti-oxidation, heat shock, or energy generation (Girardot et al., 2004). Each mode is represented by several coexpressed genes. In order to characterize the stress response of a cell it is thus sufficient to reconstruct only those few modes. In this paper, we develop a method which focuses on the identification of these stress response modes.

More specifically, given two sets of cells (controls and stressed cells) the expression value xi of a probeset from stressed cells is modeled as the sum of a characteristic expression value x¯i,control in controls, a shift component Δx¯i,char induced by applying stress and a noise component ψi.xi=x¯i,control+Δx¯i,char+ψiThe standard deviation of the noise component ψi is denoted by σi. The noise component, though a mixture of technical and biological noise, is solely interpreted as biological. As the relative influence of technical noise decreases with increasing expression value it is advisable to restrict analysis to highly expressed probesets. Using Eq. (1) gives a linear model for the global stress response Δx¯char which is dominated by a small number of response modes. Given the validity of the linear model and due to the coexpression among the genes within a response mode, the shift component Δx¯i,char can be represented by few probesets. These representative probesets are identified by performing a forward stagewise feature selection procedure with probesets ranked by least-angle regression (LARS) (Efron et al., 2004) according to their importance for accurately modeling the characteristic shift component. This results in a vector u with the minimum number of nonzero entries required for appropriate modeling. The linear model is limited in that it does not cover on/off regulation of pathways or feedback loops that may sharply amplify the expression of pathways. Therefore, probesets whose expression value significantly deviates from the linear model indicate changes in the coregulation structure of the respective genes induced by non-generic stress response mechanisms. These outlier probesets and those having a nonzero coefficient in the perturbation vector u completely describe the cellular response. The model provides a background allowing the quantitative separation of the experimental, genome-wide data into the contributions of the generic core response mechanisms and the contributions of compound-specific, local or non-linear atypical stress response mechanisms (López-Maury et al., 2008).

In toxicology, it is common praxis to assess the carcinogenicity of a chemical compound with a 2-year rodent bioassay. As a shortcut, it is envisioned to use gene expression profiling after as little as 2 weeks of experimentation. It is assumed that an early cellular response is predictive for the toxicological outcome. The toxicity classes under consideration, here genotoxic carcinogen and non-genotoxic carcinogen, display similar, genome-wide stress responses. Hence it is important to develop a method that can delineate specific and general aspects of stress responses. As outlined above, a linear elastic network model allows to identify markers for general and specific stress responses. We used this approach to analyze data from a toxicogenomic experiment (Ellinger-Ziegelbauer et al., 2008, Ellinger-Ziegelbauer et al., 2005). A test set of 138 Affymetrix GeneChip experiments (13 different training compounds, replicas and controls) was used to develop the method and to train the model. An independent validation set of 189 experiments (16 validation compounds) was used to validate the accuracy of the approach.

The test set was used to derive a network model for linear stress response for each toxicity class. Each model was defined by a small set of probesets Gu, called attractors, which had a nonzero coefficient in u. Some probesets deviated from the prediction and are denoted as outlier genes Go. Since Go and Gu completely describe the observed cellular response to stress, they are, together, denoted as biomarkers. The biomarkers were analyzed with respect to the biological role of their genes. In addition, they were evaluated as classifiers for each toxicity class using an artificial neural network. The performance of the classifier was assessed using the validation data set. Fig. 1 presents a flow chart of experimental and computational steps.

Section snippets

Expression profiles induced by genotoxic and non-genotoxic carcinogens in rat liver

In this study, a methodology was developed to analyze gene expression response upon exposure to stress. The experimental data was previously generated with the ambition to identify biomarkers to classify the carcinogenic potential of chemical compounds (Ellinger-Ziegelbauer et al., 2008). Three different classes of xenobiotic compounds – genotoxic carcinogens, non-genotoxic carcinogens, and non-hepatocarcinogens – were analyzed. Male Wistar rats were treated for up to 14 days with selected

Results

Attractor and outlier probesets were estimated for each class of compounds using a linear elastic response model of gene expression as described in the methods section. The discrepancy between the measured gene expression and the network model, measured by its residual sum of squares (RSS) value, was calculated with respect to the allowed number of attractor genes n. Fig. 2 shows an example of the relationship between the model quality and number of genes. We chose a cut off for the number of

Discussion

Our work was based on expression profiles of stressed and unstressed liver tissue. From gene expression data a linear elastic network model was calculated and biomarkers were derived. The validation of the elastic response network model is twofold: (A) The biological role of the biomarkers is examined, in particular in comparison with a-priori expectations for the different compound classes. (B) Biomarkers are used to build classifiers of toxicity classes. The classifier is validated on an

Conclusion

Here a new method to identify biomarkers for specific stress states was developed. This method is based on describing global gene expression in terms of a linear network model. Validating this approach was two fold. It is claimed that the set of genes that define the network model, Gu, and the outliers to the model, Go, in conjunction define the response of the cell. If this is true, it shall be expected that the set of genes is biologically meaningful for the type of stress exerted. It is

Acknowledgements

Dr. Thomas Mrziglod supported the authors in the use of artificial neural networks. John Vitovsky kindly proof-read the manuscript.

We acknowledge financial support by the Bundesministerium für Bildung und Forschung (BMBF) for funding parts of this work through the HepatoSys network.

References (35)

  • A.H. Brivanlou et al.

    Signal transduction and the control of gene expression

    Science

    (2002)
  • B. Efron et al.

    Least angle regression

    Ann. Stat.

    (2004)
  • P. Foureman et al.

    Chemical mutagenesis testing in drosophila. ix. Results of 50 coded compounds tested for the national toxicology program

    Environ. Mol. Mutagen.

    (1994)
  • A. Fujita et al.

    Multivariate gene expression analysis reveals functional connectivity changes between normal/tumoral prostates

    BMC Syst. Biol.

    (2008)
  • T.S. Gardner et al.

    Inferring genetic networks and identifying compound mode of action via expression profiling

    Science

    (2003)
  • F. Girardot et al.

    Genome wide analysis of common and specific stress responses in adult drosophila melanogaster

    BMC Genomics

    (2004)
  • A. Gordon et al.

    Control of the mean number of false discoveries, bonferroni and stability of multiple testing

    Ann. Appl. Stat.

    (2007)
  • 1

    These authors contributed equally to this work.

    View full text