A general probabilistic model of the PCR process

https://doi.org/10.1016/j.amc.2006.01.092Get rights and content

Abstract

This paper rigorously derives a general probabilistic model for the PCR process; this model includes as a special case the Velikanov–Kapral model where all nucleotide reaction rates are the same. In this model the probability of binding of deoxy-nucleoside triphosphate (dNTP) molecules with template strands is derived from the microscopic chemical kinetics. A recursive solution for the probability distribution of binding of dNTPs is developed for a single cycle and is used to calculate expected yield for a multicycle PCR. The model is able to reproduce important features of the PCR amplification process quantitatively. With a set of favorable reaction conditions, the amplification of the target sequence is fast enough to rapidly outnumber all side products. Furthermore, the final yield of the target sequence in a multicycle PCR run always approaches an asymptotic limit that is less than one. The amplification process itself is highly sensitive to initial concentrations and the reaction rates of addition to the template strand of each type of dNTP in the solution.

Introduction

The polymerase chain reaction (PCR) is a powerful technique used for the amplification of specific segments of DNA or mRNA. Over the past decade the PCR has rapidly become a technique of choice for researchers in bioinformatics due to its capabilities of detecting and amplifying low copy segments. However, in practice, the PCR process does not always have a consistent relation between the initial target amount and the absolute amount of the synthesized product. This is due to the PCR’s high sensitivity to several variables that includes intractable variations between the containers where the reaction is carried out. Therefore, comparisons of the amount of product to that of an external control standard do not always lead to accurate quantifications. This problem, however, is addressed in quantitative competitive PCR (QC-PCR).

In the QC-PCR, a competitive mRNA or DNA, namely an allelic variant of the target template, is used as an internal standard to provide an internal control in the amplification process. Quantification is managed by determining the amounts of co-amplified products from replicated proportions of the target with the dilution series of the competitor. A normalization based on co-amplification of a heterologous sequence, however, does not optimally address the difference of yield due to different template efficiencies in the amplification. It is quite difficult to rigorously quantify these differences.

Another popular version of PCR is called real time PCR (RT-PCR) and is used widely as an industry standard. It can be used also to validate gene expression data obtained from microarray experiments. Determining yield by following the real time kinetics of PCR eliminates the need for a competitor to be co-amplified with the target for the internal standard. Quantitation can be performed by the more basic method of preparing a standard curve and determining an unknown amount by comparison to the standard curve. Real time PCR quantitation eliminates post-PCR processing of PCR products (which is necessary in QC-PCR). This helps to increase throughput, reduce the chances of carryover contamination, and remove post-PCR processing as a potential source of error.

PCR is an extremely important process from a biologist’s point of view, with application in clinical microbiology, food microbiology, and veterinary microbiology. It is also a very important technique in the areas of clinical oncology, minimal residual disease, and chromosomal translocations, but perhaps one of the most important uses of PCR is in microarray experiments of gene expression. In this case, accuracy of quantification is extremely important as a slight variation in estimating initial concentration of mRNA (or cDNA) can lead to false conclusions. Due to the exponential nature of the PCR, estimating this initial concentration accurately is difficult because of errors from various sources. Moreover, these sensitivities increase with cycle number and the process becomes nonlinear beyond a certain cycle number.

Therefore, in spite of the popularity of the PCR, theoretical considerations to reliably describe its action in different applications have been limited mostly to experimental inferences rather than mathematical derivations from biophysical principles; consequently, the currently used expressions lack consistency since their foundations have not been clearly established, frequently leading to empirical fitting procedures of experimental data that result in poor quantifications. It is clear that a physics based model will predict the yield of the PCR amplification in a much better fashion; however, developing any type of generalized model is not an easy task. The present study attempts to develop such a model. Before going into details of the formulation for this model some background on the PCR amplification is given.

The PCR amplification process in general is conducted in vitro. The three primary ingredients for this process are the three nucleic acid segments: a double-stranded DNA containing the sequence to be amplified and two single-stranded primers. They react in an environment containing a DNA polymerase enzyme, deoxy-nucleoside triphosphates (dNTPs), a buffer, and a magnesium salt. Through cycles of combined denaturing, annealing (a vast number of primers is added to ensure complete annealing), and DNA synthesis, the primers hybridize to opposite strands of the target sequence such that the synthesis stage proceeds across the region between the primers, thus doubling the DNA amount. Therefore, the products formed in successive cycles should result in geometric accumulation and the target amplification after n cycles can be approximated byNn=2nN0,where N0 is the initial amount of DNA segment to be amplified.

The quantitative reliability of the PCR, however, is limited by the amplification process itself. Due to its geometric nature, small differences in any of the control variables will dramatically affect the reaction yield. The variables that influence the yield of the PCR process are the concentration of the DNA polymerase, dNTPs, MgCl2, DNA, and primers; the denaturing, annealing and synthesis temperature; the length and the number of cycles; ramping times, and the presence of contaminating DNA and inhibitors in the sample. Even if extreme care is taken to strictly control these parameters the tube–tube variation may sometimes affect the outcome of the reaction. The physical basis of such variation is not yet known. Some researchers [1], [2] indicated that this variation might be due to small temperature differences along the thermal cylinder block during the first few cycles. According to Wang et al. [3] and Gilliland et al. [4], normalization based on co-amplification does not optimally address the variation of yield due to the difference in template efficiencies. In reality it is a well-observed fact that the reaction efficiency is never 100% and does not remain constant during the cycles. Hence, the accumulation trend is better represented asNn=i=1n(1+ϵi)N0,where ϵi is the cycle efficiency and is estimated empirically from the experimental data.

A different deterministic and more physics based approach was proposed by Schnell and Mendoza [5], [6] who used the law of mass action to derive the kinetic equations for PCR. Stochastic models for PCR have also been developed [7], [8], [9], [10], [11], [12]. Finally, a combined deterministic and stochastic approach was proposed by Stolovitzky and Cecchi [13]. They used a deterministic mass action equation to compute the amplification efficiency and estimate the number of PCR cycles. Although these models lead to a better quantification for the phenomenon, they still do not provide a complete solution because the efficiency was assumed to be approximately constant during all cycles.

Velikanov and Kapral [14] proposed a probabilistic approach to the kinetics of the PCR, which focused on the microscopic nature of the amplification process. Their results indicated that the model was able to reproduce the main qualitative features of PCR kinetics, namely sensitivity to reaction conditions and leveling-off of the yield with increasing number of cycles (the plateau effect). Though they were able to obtain a closed form solution for their model, it required the strong assumption that all nucleotides were identical, which is never the case in reality. The chemical kinetics of nucleotides binding to the template strand is highly dependent on the type of the nucleotide. Moreover, the initial number of each type of nucleotide at the beginning of each cycle may not be the same and may influence the dynamics of the reaction in subsequent cycles. In this paper, the master equation developed by Velikanov and Kapral [14] is modified to accommodate the fact that the initial template strand consists of the four different types of nucleotides, namely A, C, T, and G, and that the initial numbers of these nucleotides present in the buffer solution at the beginning of each cycle are independent of each other. For completeness, the closed form solution for the Velikanov–Kapral model is rederived with rigorous mathematics, from which can be seen clearly the specific tacit assumptions that are required for this closed form solution to hold. These assumptions are then removed for the more general model proposed here.

Section snippets

Polymerase chain reaction: a probabilistic approach

Let the length of the template strand be L and the length of the growing strand be ℓ at a given time t, ℓ0 being the length at t = 0. A reasonable assumption is that, at the molecular level, the probability of a reaction event is proportionate to the number of ways in which the molecules of the reactants available in the system can be combined for the reaction to take place. It can be further assumed that the template strand consists of all of the four different nucleotides (A, C, T, and G) in an

Results and discussion

Eq. (26) was solved numerically and the probability functions were plotted in Fig. 1, Fig. 2, Fig. 3, Fig. 4. These figures represent the probability function P(,η) with respect to ℓ and η for different template strands of equal length. The strands in Fig. 1, Fig. 2 consist of types T and C nucleotides with a single transition in the base pair sequence at the 11th base pair. For Fig. 1, it is assumed that the initial number (i.e., concentration) of type A (complement of type T) nucleotides is

Acknowledgements

The authors gratefully acknowledge the generous assistance of Dr. Ruth Grene and Dr. Gregory Gonye.

References (17)

There are more references available in the full text version of this article.

This work was supported in part by NSF Grants EIA-0103660, IBN-0219322, and AFRL Grant F30602-01-2-0572.

View full text