Skip to main content
Log in

Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

Accurate identification of a DNA sequence depends on the ability to precisely track the time varying signal baseline in all parts of the electrophoretic trace. We propose a statistical learning formulation of the signal background estimation problem that can be solved using an Expectation-Maximization type algorithm. We also present an alternative method for estimating the background level of a signal in small size windows based on a recursive histogram computation. Both background estimation algorithms introduced here can be combined with regression methods in order to track slow and fast baseline changes occurring in different regions of a DNA chromatogram. Accurate baseline tracking improves cluster separation and thus contributes to the reduction in classification errors when the Bayesian EM (BEM) base-calling system, developed in our group (Pereira et al., Discrete Applied Mathematics, 2000), is employed to decide how many bases are “hidden” in every base-call event pattern extracted from the chromatogram.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. Alphey, DNA Sequencing: From Experimental Methods to Bioinformatics, Springer-Verlag, 1997.

  2. T.A. Brown, DNA Sequencing: The Basics, Oxford University Press, 1994.

  3. D. Micklos and G. Freyer, Primer on Molecular Genetics, U.S. Dept. of Energy, 1992.

  4. J. Forrester, "Interpreting DNA Sequencing Results," http://biotech.missouri.edu/dnacore/.

  5. Perkin-Elmer, ABI PRISM, DNA Sequencing Analysis Software, User's Manual, Applied Biosystems, Foster City, CA, 1996.

  6. M. Pereira, L. Andrade, S. El-Difrawy, B. Karger, and E. Manolakos, "Statistical Learning Formulation of theDNABase-Calling Problem and its Solution Using a Bayesian EM Framework," Discrete Applied Mathematics, vol. 104, no. 1-3, 2000, pp. 229-258.

  7. B. Ewing, L. Hillier, M. Wendl, and P. Green, "Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment," Genome Research, vol. 8, 1998, pp. 175-185.

    Article  Google Scholar 

  8. L. Andrade and E. Manolakos, "Skyline Normalization of DNA Chromatograms by Regression," inWorkshop on Genomic Signal Processing and Statistics (GENSIPS), 2002, pp. CP2-07:1-4.

  9. H. Fujii and K. Kashiwagi, "Compensation for Mobility Inequalities between Lanes Computed from Band Signals in On-line FluorescenceDNASequencing," Electrophoresis, vol. 13, 1992, pp. 500-505.

    Article  Google Scholar 

  10. S. El-Difrawy and E. Manolakos. "An Analytical Solution to the Mobility Shifts Correction Problem for DNA chromatograms," in Workshop on Genomic Signal Processing and Statistics (GENSIPS), 2002, pp. CP2-05:1-4.

  11. C.G. Molina and J. Mullikin, "AProbabilistic Approach for Long Read-Length DNA Sequence Analysis," in IEEE Workshop on Neural Networks for Signal Processing (NNSP), Sept. 2002, pp. 45-56.

  12. L. Andrade and E. Manolakos, "Accurate Estimation of the Signal Baseline in DNA Chromatograms," in IEEE Workshop on Neural Networks for Signal Processing (NNSP), Sept. 2002, pp. 35-44.

  13. J. Golden, D. Torgersen, and C. Tibbetts, "Pattern Recognition for AutomatedDNASequencing: I. On-line Signal Conditioning and Feature Extraction for Base-Calling," in First International Conference on Intelligent Systems for Molecular Biology, AAAI Press, 1993, pp. 136-144.

  14. Z. Yin, J. Severin, M.C. Giddings, W. Huang, M.S. Westphall, and L.M. Smith, "Automatic Matrix Determination in Four Dye Fluorescence-Based DNA Sequencing," Electrophoresis, vol. 17, 1996, pp. 1143-1150.

    Article  Google Scholar 

  15. M.C. Giddings, J. Severin, M. Westphall, J. Wu, and L.M. Smith, "Asoftware system for data analysis in automated DNA sequencing,"Genome Research, vol. 8, 1998, pp. 644-665.

    Google Scholar 

  16. D. Brady, M. Kocic, A. Miller, and B. Karger, "Maximum Likelihood Base-Calling for DNA Sequencing," IEEE Trans. Signal Background Estimation 243 on Biomedical Engineering, vol. 47, no. 9, 2000, pp. 1271-1280.

    Google Scholar 

  17. A. Berno, "A Graph Theoretic Approach to the Analysis of DNA Sequencing Data," Genome Research, vol. 6, no. 2, 1996, pp. 80-91.

    Article  Google Scholar 

  18. T.K. Moon, "The Expectation-Maximization Algorithm," IEEE Signal Processing Magazine, vol. 13, no. 6, 1996, pp. 47-60.

    Article  Google Scholar 

  19. D. Walther, G. Bartha, and M. Morris, "Base-Calling with LifeTrace," Genome Research, vol. 11, 2001, pp. 875-888.

    Article  Google Scholar 

  20. T.D. Yager, L. Baron, R. Batra, A. Bouevitch, D. Chan, K. Chan, S. Darasch, R. Gilchrist, A. Izmailov, J.M. Lacroix, K. Marchelleta, J. Renfrew, D. Rushlow, E. Steinbach, C. Ton, P. Waterhouse, H. Zaleski, J.M. Dunn, and J. Stevens, "High performance DNA Sequencing, and the detection of Mutations and Polymorphisms, on the Clipper Sequencer," Electrophoresis, vol. 20, 1999, pp. 1280-1300.

    Article  Google Scholar 

  21. S.B. Needleman and C.D. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins," Journal of Molecular Biology, vol. 48, 1970, pp. 443-453.

    Article  Google Scholar 

  22. T.F. Smith and M.S. Waterman, "Identification of Common Molecular Subsequences," Journal of Molecular Biology, vol. 147, 1981, pp. 195-197.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andrade, L., Manolakos, E.S. Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 35, 229–243 (2003). https://doi.org/10.1023/B:VLSI.0000003022.86639.1f

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VLSI.0000003022.86639.1f

Navigation