An objective, model-independent method for detection of non-uniform steps in noisy signals

https://doi.org/10.1016/j.cpc.2008.06.008Get rights and content

Abstract

Biophysical techniques, such as single molecule FRET, fluorescence microscopy, single ion-channel patch clamping, and optical tweezers often yield data that are noisy time series containing discrete steps. Here we present a method enabling objective identification of nonuniform steps present in such noisy data. Our method does not require the assumption of any underlying kinetic or state models and is thus particularly useful for analysis of novel and poorly understood systems. In contrast to other model-independent methods, no parameters or other information is taken from the user. We find that, at high noise levels, our method exceeds the performance of other model-independent methods in accurately locating steps in simulated noisy data.

Introduction

Physical measurements of single (bio)molecules have enabled the detection of biochemical and structural states and the transitions between them, which otherwise would have been obscured in ensemble-averaged experiments. The time evolution of many such biomolecular systems as observed with single-molecule techniques may be characterized by dwells punctuated by rapid transitions, often called steps. A classic example results from the patch-clamp technique monitoring the open and closed states of a single trans-membrane channel [1]. More recent advances using optical tweezers and fluorescence microscopy have uncovered staircase displacement records of individual motor proteins [2], [3], [4], while single-pair FRET has provided unprecedented insight into structural heterogeneity and conformational kinetics of individual enzymes and ribozymes [5], [6], to name only a few of an increasing arsenal of single-molecule methods.

Measuring the distributions of step sizes and durations, rather than just their mean value, has provided invaluable clues towards understanding the structural and kinetic mechanisms underlying biomolecules' function [7]. Thus it is imperative to be able to reliably detect the size and durations of steps in these data, which are typically noisy. When the size of the step far exceeds the typical noise levels associated with the experiment, one may choose to identify steps by eye, possibly without much danger of experimentalists' bias and without much penalty towards accuracy [8]. Such visual identification may be aided through use of filters such as low-pass filters, Chung and Kennedy's nonlinear filter [9], or Haar wavelet-based filters [10].

However, once S/N, the ratio of the step size and the standard deviation of the noise, decreases, such subjective methods no longer suffice and objective methods guaranteeing a high degree of reproducibility are required. Depending on the biomolecular system and the design of the experiment, these methods may be as simple as constructing a histogram of the raw data or computation of correlation functions to uncover step sizes and rates, respectively. For example, in the case of processive motor proteins a pairwise histogram directly yields the step height [2], [11]. Further analysis of the variance among different displacement records can also provide estimates of the mean step size when individual steps cannot be recognized, but requires assumptions about the statistical distribution of step durations [12], [13], [14], [15], [16]. Furthermore, these methods require that the step height not vary within the data record under analysis.

Other methods based on the theory of hidden Markov models are quite successful, but also make use of assumed underlying kinetic models to locate steps of arbitrary size and directly estimate the parameters of the model [17], [18]. When no such model is known or when assuming a model is undesirable, more general automatic methods, including velocity thresholding used by Hua et al. [19], wavelet component or multiscale product thresholding as discussed by Wang [20] or Sadler and Swami [21], a sliding window Student t-test used by Carter and Cross [3], and step-function fitting with an heuristic termination criterion as developed by Kerssemakers [22] are available.

All of the above methods take from the user at least one parameter, usually corresponding to a physical time, length, or intensity scale. The accuracy with which steps are detected depends, often strongly, on the choice of this parameter, with a tradeoff between missing rapid or small steps and splitting single events into multiple steps or introducing spurious steps. For example, the sliding-window t-test yields ambiguous and, at times, unreliable results when two steps are near to each other in time. The step-fitting method of Kerssemakers et al. is notable in that its user-selected parameter corresponds not to a scale but to the number of steps fit to the data, with an heuristic, the ratio between χ2 for the fit and χ2 for a “counter-fit” with steps placed in the best fit's dwells, introduced to guide selection [22]. The authors note that the optimal parameter differs slightly from their theoretical framework's prediction, but provide no quantitative means to make the adjustment.

A recent review by Carter et al., comparing four model-independent step detection methods by Carter, Hua, Sadler, and Kerssemakers, found that the Kerssemakers method was the most accurate given the chosen criteria, especially when the signal-to-noise ratio approached 1.6, the lowest level tested [23]. Parameters of the other three methods were empirically optimized using simulated data; the Kerssemakers method's parameter was taken as its theoretical optimum. Pre-filters yielding the best result were likewise chosen empirically for each method using simulated data. No method was accurate enough at high noise levels to be compelling, moreover, such empirical parameter optimization and filter selection is not available to experimenters who cannot make assumptions about the underlying kinetic process.

We present a method for model-independent (i.e. not assuming a kinetic model) step detection which makes use of the Schwarz Information Criterion for statistical model selection to determine the number and location of steps [24]. Importantly, our method takes no input from the user other than the data itself, and makes no use of any assumed threshold values or confidence levels. Detection of steps is thus entirely automatic and objective, based solely on the properties of the data. To illustrate the challenges with which one is presented when identifying individual steps in typical single-molecule biophysical data, the resulting fit to the position of a single kinesin molecule pulling a bead against a constant applied force is plotted in Fig. 1 [15].

Section snippets

Objective test statistic for finding steps

Determining the number and location of steps in noisy data is an example of a general class of problem known as change-point problems, which involve estimating the number and location of changes in the underlying distribution of a series of random trials, most usually a change in statistical moments, such as the mean or variance [25]. Change-point problems have been extensively treated both as a problem of theoretical statistics and in applications to industrial quality control and finance [26]

Successive step placement

Schwarz Information Criterion hypothesis tests are simple in principle: using all of the data, one computes a single number, the SIC, for each hypothesis in question and accepts the hypothesis for which this number is the lowest. As the hypotheses are judged only with respect to each other, there is no confidence level or assumed threshold assumed or hidden in the algorithm. In the step-detection case, these competing hypotheses are the different numbers and locations of steps. Despite such

Results

Applications of our step detection method are shown in Fig. 1, Fig. 2, finding steps in a displacement of the motor protein kinesin and in simulated data traces, respectively. However, objective tests are required to quantitatively evaluate the performance of our method. To facilitate a direct comparison to existing methods, we have adopted the test criteria introduced by Carter et al., who recently reviewed the most commonly used step detection methods [23]:

  • (1)

    Successful identification of a

Discussion

Our method takes no input from the user except the data itself. We do no internal heuristic parameter setting; the only assumption made is that the data take the form of steps with Gaussian white noise. In a sense, the algorithm presented above is, to the best of our knowledge, the first fully objective model-independent method for detection of steps in noisy data. We have found that under most circumstances, especially at high noise, we exceed the performance of step-finding algorithms with

Algorithm

Our step-finding algorithm was coded in C and interfaced to LabView (National Instruments, Austin, TX) using a Call Library Function node. A linked-list data structure was used to keep track of dwell durations, positions, and variances, reducing the number of multiplications to be performed per iteration.

Generation of benchmark data

To directly test our step-finding algorithm, we generate series of sharp steps with randomly distributed heights or dwell lengths (as specified above), adding Gaussian white noise. Unless

Acknowledgements

This work was supported by the BIO5 Institute. B. K. was the recipient of a Biology, Math, and Physics Initiative fellowship.

We are grateful to Joe Watkins for discussions, Jie Chen for clarification of some of the mathematical literature, and Tom Perkins for testing several preliminary incarnations of our step-detection algorithm.

References (38)

  • S. Chung et al.

    J. Neurosci. Methods

    (1991)
  • K. Svoboda et al.

    Cell

    (1994)
  • L.S. Milescu et al.

    Biophys. J.

    (2006)
  • L.S. Milescu et al.

    Biophys. J.

    (2006)
  • B.C. Carter et al.

    Biophys. J.

    (2008)
  • J. Rissanen

    Automatica

    (1978)
  • E. Neher et al.

    Nature

    (1976)
  • K. Svoboda et al.

    Nature

    (1993)
  • N. Carter et al.

    Nature

    (2005)
  • S. Toba et al.

    Proc. Natl. Acad. Sci. USA

    (2006)
  • T. Ha

    Proc. Natl. Acad. Sci. USA

    (1999)
  • X. Zhuang

    Science

    (2000)
  • J.C.M. Gebhardt et al.

    Proc. Natl. Acad. Sci. USA

    (2006)
  • C. Kural

    Science

    (2005)
  • M.J. de Castro et al.

    Nat. Cell. Biol.

    (2000)
  • E.A. Abbondanzieri et al.

    Nature

    (2005)
  • K. Svoboda et al.

    Proc. Natl. Acad. Sci. USA

    (1994)
  • M.J. Schnitzer et al.

    Nature

    (1997)
  • K. Visscher et al.

    Nature

    (1999)
  • Cited by (63)

    • Determining the Stoichiometry of Small Protein Oligomers Using Steady-State Fluorescence Anisotropy

      2020, Biophysical Journal
      Citation Excerpt :

      The exact assignment of bleaching steps is difficult, given the limitation of signal/noise ratio in TIRF microscopy. Hence, we applied the L1 regularization Kalafut-Visscher algorithm step-finding algorithm that we tested using simulated data with a similar signal/noise ratio (39,40). The majority of the molecules exhibited seven bleaching steps (Fig. 5 I) with minor populations showing higher complex sizes (Fig. S6 B).

    • Single-molecule FRET methods to study the dynamics of proteins at work

      2019, Current Opinion in Biomedical Engineering
    View all citing articles on Scopus
    View full text