Skip to main content

Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set

  • Conference paper
Book cover Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence (ICIC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5755))

Included in the following conference series:

Abstract

Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures databases, this data set is imbalanced and it can cause a low error rate for the majority class and an undesirable error rate for the minority class. In this paper we evaluate the effects of an imbalanced data set in training and learning of neural networks when they are applied to predict protein secondary structure. For this we applied resampling methods to tackle the imbalance class problem. Results show that imbalanced data sets decrease the helixes predictions rates. Although, protein data set distribution does not affect significantly the global accuracy (Q3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nelson, D.L., Cox, M.M.: Lehninger Principles of Biochemistry. W H Freeman, New York (2005)

    Google Scholar 

  2. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York (2004)

    Google Scholar 

  3. Isaev, A.: Introduction to Mathematical Methods in Bioinformatics. Springer, Heidelberg (2006)

    Google Scholar 

  4. Tramontano, A.: Protein Structure Prediction. Wiley-VCH, Weinheim (2006)

    Google Scholar 

  5. Bourne, P.E., Weissig, H.: Structural Bioinformatics. Wiley-Liss, New Jersey (2003)

    Book  Google Scholar 

  6. Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins. Journal of molecular Biology 120, 97–120 (1978)

    Article  Google Scholar 

  7. Gibrat, J.F., Garnier, J., Robson, B.: Further Developments of Protein Secondary Structure Prediction Using Information Theory. Journal of Molecular Biology 198, 425–443 (1987)

    Article  Google Scholar 

  8. Biou, V., Gibrat, J.F., Levin, J.M., Robson, B., Garnier, J.: Secondary Structure Prediction: Combination of Three Different Methods. Prot. Engin. 2, 185–191 (1988)

    Article  Google Scholar 

  9. Yi, T.M., Lander, E.S.: Protein Secondary Structure Prediction Using Nearest-Neighbor Methods. Journal of Molecular Biology 232, 1117–1129 (1993)

    Article  Google Scholar 

  10. Salamov, A.A., Solovyev, V.V.: Prediction of Protein Secondary Structure by Combining Nearest-Neighbor Algorithms and Multiple Sequence Alignment. Journal of Molecular Biology 247, 11–15 (1995)

    Article  Google Scholar 

  11. Chen, C., Chen, L., Zou, X., Cai, P.: Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein and Peptides Letters 16, 27–31 (2009)

    Article  Google Scholar 

  12. Nguyen, M.N., Rajapakse, J.C.: Prediction of Protein Secondary Structure with two-stage multi-class SVMs. International Journal in Data Mining and Bioinformatics 1, 248–269 (2007)

    Article  Google Scholar 

  13. Yi, T.M., Lander, E.S.: Protein Secondary Structure Prediction Using Nearest-Neighbor Methods. Journal of Molecular Biology 232, 1117–1129 (1993)

    Article  Google Scholar 

  14. Bohr, H., Bohr, J., Brunak, S., Cotterill, R., Lautrup, B.: Protein Secondary Structure and Homology by Neural Networks. FEBS Letter 241, 223–228 (1988)

    Article  Google Scholar 

  15. Qian, N., Sejnowski, T.J.: Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Journal of Molecular Biology 202, 865–884 (1988)

    Article  Google Scholar 

  16. Holley, H.L., Karplus, M.: Protein Secondary Structure Prediction with a Neural Network. Proceedings of the National Academy of Sciences of U.S.A. 86, 152–156 (1989)

    Article  Google Scholar 

  17. Rost, B., Sander, C.: Prediction of Protein Secondary Structure at Better than 70% Accuracy. Journal of Molecular Biology 232, 584–599 (1993)

    Article  Google Scholar 

  18. Maclin, R., Shavlik, J.W.: Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou-Fasman Algorithm for Protein Folding. Machine Learning 11, 195–215 (1993)

    Google Scholar 

  19. Chandonia, J.-M., Karplus, M.: Neural Networks for Secondary Structure and Structural Class Predictions. Protein Science 4, 275–285 (1995)

    Article  Google Scholar 

  20. Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the Past and the Future in Protein Secondary Structure Prediction. Bioinformatics 15, 937–946 (1999)

    Article  Google Scholar 

  21. Jones, D.T.: Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology 292, 195–202 (1999)

    Article  Google Scholar 

  22. Ouali, M., King, R.D.: Cascaded Multiple Classifiers for Secondary Structure Prediction. Protein Science 9, 1162–1176 (2000)

    Article  Google Scholar 

  23. Pollastri, G., Przybylski, D., Baldi, P.: Improving the Predicition of Protein Secondary Structure in Three and Eight classes using recurrent neural networks and profiles. Proteins: Structure, Function and Genetics 47, 228–235 (2002)

    Article  Google Scholar 

  24. Yao, X.Q., Zhu, H., She, Z.S.: A Dynamic Bayesian Network Approach to Protein Secondary Structure Prediction. BMC Bioinformatics 9 (2008)

    Google Scholar 

  25. Liu, K.H., Xia, J.F., Li, X.: Efficient Ensemble Schemes for Protein Secondary Structure Prediction. Protein and Peptides Letters 15, 488–493 (2008)

    Article  Google Scholar 

  26. Malekpour, S.A., Naghizadeh, S., Pezeshk, H., Sadeghi, M., Eslahchi, C.: Protein secondary structure prediction using three neural networks and a segmental semi Markov model. Mathematical Biosciences 217, 145–150 (2008)

    Article  MathSciNet  Google Scholar 

  27. Radivojac, P., Chawla, N.V., Dunker, A.K., Obradovic, Z.: Classification and Knowledge Discovery in Protein Databases. Journal of Biomedical Informatics 37, 224–239 (2004)

    Article  Google Scholar 

  28. Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Set. Sigkdd Explorations 6, 1–6 (2004)

    Article  Google Scholar 

  29. Cuff, J.A., Barton, G.: Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins: Structure, function and Genetics 34, 508–519 (1999)

    Article  Google Scholar 

  30. Rost, S.: Review: Protein Secondary Structure Continues to Rise. Journal of Structural Biology 134, 204–218 (2001)

    Article  Google Scholar 

  31. Rost, B., Sander, C.: Improved Prediction of Protein Secondary Structure by Use of Sequence Profiles and Neural Networks. Proceedings of the National Academy of Sciences 90, 7558–7562 (1993)

    Google Scholar 

  32. Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1999)

    MATH  Google Scholar 

  33. Japkowicks, N., Stephen, S.: The Class imbalance Problem: a Systematic Study. Intelligent Data Analysis 6, 429–449 (2002)

    Google Scholar 

  34. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Palodeto, V., Terenzi, H., Marques, J.L.B. (2009). Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04020-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04019-1

  • Online ISBN: 978-3-642-04020-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics