Adaptive bandwidth selection for biomarker discovery in mass spectrometry

https://doi.org/10.1016/j.artmed.2008.08.010Get rights and content

Summary

Objective

Differential quantification of proteins by liquid chromatography/mass spectrometry requires the alignment of a retention time axis. The alignment automatically corrects for time changes in the liquid chromatography unit when repeating two experiments.

Methods

In this paper we will show an extension of non-negative canonical correlation analysis. We introduce an adaptive scale space estimation that adapts the complexity of a monotone regression function to the density of measurements across the retention time. Furthermore, a global model selection of the scale is replaced by a local one, where we estimate the scale for each individual time axis, instead of a global parameter that holds for all time axes.

Results

We show in experiments that we got a 13% gain. The performance gain is measured in the number of proteins that are detected to differ significantly in abundance for two different biological samples.

Conclusion

We conclude that the adaptive scale estimation and the local model selection can outperform the global model selection which yields a more effective selection of differentially abundant proteins.

Introduction

In the recent years, liquid chromatography coupled to mass spectrometry (LC/MS) has become the technology of choice for the quantitative analysis of proteins. In a liquid chromatography experiment peptides are loaded on a column, and they elute from it after a specific retention time, giving rise to measurements in form of time series. At fixed time intervals, the MS part of the measuring device produces a mass/charge spectrum.

The classification of a protein sample according to some phenotypes is certainly one of the major goals in quantitative proteomics. When comparing two biological samples measured with LC/MS, however, one often observes a non-linear time deformation between consecutive experiments which introduces a severe alignment problem. When it comes to aligning two mass spectrometry experiments, it is desirable to include as many mass peaks as possible in the matching process, while keeping the number of false matches as low as possible. In this work we will consider a common experimental setup, in which a small fraction of peptides have been identified by way of tandem mass spectrometry. In the following we will refer to this experimental setup as LC/MS/MS.

Several alignment methods for aligning LC/MS experiments have been proposed in the literature. Popular examples include correlation optimized warping (COW) [1], where piece-wise linear functions are fitted to align the time series, hidden Markov models [2], hierarchical clustering [3], or robust point matching [4]. The non-negative canonical correlation (NN-CCA) method presented in [5], [6] extends previous pairwise alignment methods to a multiple alignment model and combines peaks from identified and unidentified peptides in a semi-supervised way.

In this paper we generalize the idea of optimizing a non-negative correlation function between LC/MS/MS experiments in two ways: (i) we introduce an adaptive scale space estimation for complexity tuning of the time-warping functions, (ii) the usual global model selection procedure is replaced by a local variant for each individual time axis. Large-scale experiments demonstrate that these extensions lead to a significant increase in the number of differentially abundant proteins.

Section snippets

Material studied, methods, techniques

In quantitative proteomics one is typically interested in discriminating between classes of proteins in a complex sample (e.g. a blood plasma sample). These classes are usually defined by specific phenotypes (e.g. diseased and healthy). One of the fundamental questions in settings of this kind is the ability to identify those proteins that play a dominant role in the actual discrimination task. The search for such relevant proteins is usually referred to as biomarker discovery.

In the

Results

The method is evaluated on a set of six LC/MS experiments, which represent two different biological conditions with three replications each. The data arises from an experiment with Arabidopsis thaliana cell culture grown under light and dark conditions. All six experiments are aligned by applying adaptive scale estimation and model selection for each individual experiment. After prediction of the retention time, we extracted the ion count of each peak. Before estimating of the log-expression

Conclusion

In this paper, we have shown that an improved alignment of liquid chromatography/mass spectrometry experiments increases the number of findings in terms of number of significantly different abundant proteins. The non-negative canonical correlation analysis is improved in two ways: first, the smoothing of the monotone regression is adapted to the local density of peptide identifications. Second, a global model selection scheme is replaced by a local model selection scheme. The adaptive scale

Acknowledgements

This work is partially supported by ETH grant nr. TH-5/04-3 and CC-SPMD (a competence center of SystemsX, the Swiss Initiative in Systems Biology).

References (8)

  • N-P. Vest Nielsen et al.

    Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping

    J Chromatogr A

    (1998)
  • J. Listgarten et al.

    Multiple alignment of continuous time series.

  • R. Tibshirani et al.

    Sample classification from protein mass spectrometry, by peak probability contrasts

    Bioinformatics

    (2004)
  • M. Kirchner et al.

    Amsrpm: Robust point matching for retention time alignment of LC/MS data with R

    J Stat Software

    (2007)
There are more references available in the full text version of this article.
View full text