Skip to main content

A Comparative Study of Machine Learning Methods for Detecting Promoters in Bacterial DNA Sequences

  • Conference paper
Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence (ICIC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5227))

Included in the following conference series:

Abstract

Machine Learning methods have been widely used in bioinformatics, mainly for data classification and pattern recognition. The detection of genes in DNA sequences is still an open problem. Identifying the promoter region laying prior the gene itself is an important aid to detect a gene. This paper aims at applying several Machine Learning methods to the construction of classifiers for detection of promoters in the DNA of Escherichia coli. A thorough comparison of methods was done. In general, probabilistic and neural network-based methods were those that performed better regarding accuracy rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: an Overview. Bioinformatics 16, 412–424 (2000)

    Article  Google Scholar 

  2. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  3. Ephraim, Y., Merhav, N.: Hidden Markov Processes. IEEE T. Inform. Theory 48, 1518–1569 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Fawcett, T.: An Introduction to ROC Analysis. Pattern Recogn. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  5. Harley, C., McClure, W.: Compilation and Analysis of Escherichia coli Promoter DNA Sequences. Nucleic Acids Res. 11, 2237–2255 (1983)

    Article  Google Scholar 

  6. Kohavi, R.: A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In: 14th Int. Joint Conf. on Artificial Intelligence, pp. 1137–1143 (1995)

    Google Scholar 

  7. Nelson, D.L., Cox, M.M.: Lehninger Principles of Biochemistry, 4th edn. W.H. Freeman, Chicago (2006)

    Google Scholar 

  8. Matthews, B.W.: Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)

    Google Scholar 

  9. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. CSHL Press, Woodbury (2001)

    Google Scholar 

  10. Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Rennie, J., Shih, L., Teevan, J., Karger, D.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Fawcett, T., Mishra, N. (eds.) 20th Int. Conf. on Machine Learning, pp. 616–623. AAAI Press, Menlo Park (2003)

    Google Scholar 

  12. Sing, T., Sander, O., Beerenwinke, N., Lengauer, T.: ROCR: Visualizing Classifier Performance in R. Bioinformatics 21, 3940–3941 (2005)

    Article  Google Scholar 

  13. Towell, G., Shavlik, J., Noordewier, M.: Refinement of Approximate Domain Theories by Knowledge-based Artificial Neural Networks. In: 8th National Conference on Artificial Intelligence, pp. 861–866. AAAI Press, Menlo Park (1990)

    Google Scholar 

  14. Weinert, W., Lopes, H.S.: Neural Networks for Protein Classification. Appl. Bioinformatics 3, 41–48 (2004)

    Article  Google Scholar 

  15. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  16. Wolfsberg, T., McEntyre, J., Schuler, G.: Guide to the Draft Human Genome. Nature 409, 824–826 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

De-Shuang Huang Donald C. Wunsch II Daniel S. Levine Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tavares, L.G., Lopes, H.S., Erig Lima, C.R. (2008). A Comparative Study of Machine Learning Methods for Detecting Promoters in Bacterial DNA Sequences. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_115

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85984-0_115

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85983-3

  • Online ISBN: 978-3-540-85984-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics