Skip to main content

Rule Extraction from SVM for Protein Structure Prediction

  • Chapter
Rule Extraction from Support Vector Machines

Part of the book series: Studies in Computational Intelligence ((SCI,volume 80))

Summary

In recent years, many researches have focused on improving the accuracy of protein structure prediction, and many significant results have been achieved. However, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. The support vector machines (SVMs) have shown better performance than most traditional machine learning approaches in a variety of application areas. However, the SVMs are still black box models. They do not produce comprehensible models that account for the predictions they make. To overcome this limitation, in this chapter, we present two new approaches of rule generation for understanding protein structure prediction. Based on the strong generalization ability of the SVM and the interpretation of the decision tree, one approach combines SVMs with decision trees into a new algorithm called SVM_DT. Another method combines SVMs with association rule (AR) based scheme called SVM_PCPAR. We also provide the method of rule aggregation for a large number of rules to produce the super rules by using conceptual clustering. The results of the experiments for protein structure prediction show that not only the comprehensibility of SVM_DT and SVM_PCPAR are much better than that of SVMs, but also that the test accuracy of these rules is comparable. We believe that SVM_DT and SVM_PCPAR can be used for protein structure prediction, and understanding the prediction as well. The prediction and its interpretation can be used for guiding biological experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barakat, N. and Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. The third Conference on Neuro-Computing and Evolving Intelligence (NCEI’04) (2004).

    Google Scholar 

  2. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167 (1998).

    Article  Google Scholar 

  3. Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002).

    Google Scholar 

  4. Chandonia, J.M. and Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins (1999) 35, 293-306.

    Article  Google Scholar 

  5. Chen, C.P., Kernytsky, A. and Rost, B.: Transmembrane helix predictions revisited. Protein Science, vol. 11, (2002), pp. 2774-2791.

    Article  Google Scholar 

  6. Cho, Y.H., Kim, J.K. and Kim, S.H.: A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, Volume 23, Issue 3, 1, (2002), 329-342.

    Article  Google Scholar 

  7. Sohn, S. Y. and Moon, T.H.: Decision Tree based on data envelopment analysis for effective technology commercialization. Expert Systems with Applications, Volume 26, Issue 2, (2004), 279-284.

    Article  Google Scholar 

  8. Henikoff, S. and Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915-10919 (1992).

    Article  Google Scholar 

  9. Hu, H., Pan, Y., Harrison, R. and Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience, Vol. 3, No. 4, Dec. 2004, pp. 265-271.

    Article  Google Scholar 

  10. Hua, S. and Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. (2001) 308: 397-407.

    Article  Google Scholar 

  11. Joachims, T.: SVMlight. http://www.cs.cornell.edu/People/tj/svm light/ (2002).

  12. Kim, H. and Park, H.: Protein Secondary Structure Prediction Based on an Improved Sup port Vector Machines Approach (2002).

    Google Scholar 

  13. Lim, T.S., Loh, W.Y. and Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty Tree Old and New Classification Algorithm. Machine Learning, Vol. 40, no. 3, pp. 203-228, Sept. 2000.

    Article  MATH  Google Scholar 

  14. Lin, S., Patel, S. and Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics, 2003.

    Google Scholar 

  15. Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997).

    Google Scholar 

  16. Lent, B., Swami, A. N. and Widom, J. Clustering association rules. In ICDE, 1997, pages 20-231.

    Google Scholar 

  17. Noble, W.S.: Kernel Methods in Computational Biology. B. Schoelkopf, K. Tsuda and J.-P. Vert, ed. MIT Press (2004) 71-92.

    Google Scholar 

  18. Núñez, H., Angulo, C. and Catala, A.: Rule-extraction from Support Vector Machines. The European Symposium on Artifical Neural Networks, Burges, ISBN 2-930307-02-1, 2002, pp. 107-112.

    Google Scholar 

  19. Kretschmann, E., Fleischmann, W. and Apweiler, R.: Automatic Rule Generation for protein Annotation with the C4.5 Data Mining Algorithm Applied on SWISS-PROT. Bioinformatics, (2001), 17(10).

    Google Scholar 

  20. Quinlan, J.R.: C4.5:Programs for Machine Learning. San Mateo, Calif: Morgan Kaufmann, 1993.

    Google Scholar 

  21. Rost, B. and Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. (1993) 232, 584-599.

    Article  Google Scholar 

  22. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Inc., New York (1998).

    MATH  Google Scholar 

  23. Yang, Z.R. and Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5), 2004.

    Google Scholar 

  24. Sikder, A.R. and Zomaya, A.Y.: An “overview of protein-folding techniques: issues and perspectives,” Int. J. Bioinformatics Research and Applications, Vol. 1, issure 1, pp. 121-143, 2005.

    Article  Google Scholar 

  25. He, J., Hu, H., Harrison, R., Tai, P.C. and Y. Pan, “Transmembrane segments prediction and understanding using support vector machine and decision tree,” Expert Systems with Applications, Special Issue on Intelligent Bioinformatics Systems, vol. 30, pp. 64-72, 2006.

    Google Scholar 

  26. Andrews, R., Diederich, J. and Tickle, A.: A Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks. Knowledge-Based Systems (1995), 8(6), pp. 373-389.

    Article  Google Scholar 

  27. Tickle, A., Andrews, R., Mostefa, G. and Diederich, J.: The Truth will come to light: Directions and Challenges in Extracting the Knowledge Embedded within Trained Artificial Neural Networks. IEEE Transactions on Neural Networks, (1998), 9(6), pp. 1057-1068.

    Article  Google Scholar 

  28. Zhou., Z.-H. and Jiang, Y.: NeC4.5.: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, (2004), 16(6): 770-773.

    Article  MathSciNet  Google Scholar 

  29. Chen, C.P., Kernytsky, A. and Rost, B.: Transmembrane helix predictions revisited. Protein Science, vol. 11, (2002), pp. 2774-2791.

    Article  Google Scholar 

  30. Möller, S., Kriventseva, Apweiler, E.: V. and R.: A collection of well characterized integral membrane proteins. Bioinformatics, vol. 16, (2000), pp. 1159-1160.

    Article  Google Scholar 

  31. Jones, D. T.: “Protein Secondary Structure Prediction Based on Position-specific Scoring Matrix,” J. Mol. Biol, vol. 292, (1999), pp. 195-202.

    Article  Google Scholar 

  32. Wang, K., Zhou, S. and Y. He, “Growing Decision Trees On Support-Less Association Rules,” presented at Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), Boston, MA, 2000.

    Google Scholar 

  33. Hu, H., Wang, H., Harrison, R., P.C. Tai, and Y. Pan, “Understanding the Prediction of Transmembrane Proteins by Support Vector Machine using Association Rule Mining,” presented at IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB ’07), Honolulu, Hawaii, 2007.

    Google Scholar 

  34. Yin, X. and Han, J. “CPAR: Classification based on Predictive Association Rules,” presented at SIAM Int. Conf. on Data Mining (SDM’03), San Fransisco, CA, 2003.

    Google Scholar 

  35. Zhang, C. and Zhang, S.: Association Rule Mining: Models and Algorithms: Springer-Verlag Berlin and Heidelberg GmbH & Co. K, 2002.

    Google Scholar 

  36. Agrawal, R., Imielinski, T. and A. Swami: “Database mining: A performance perspective,” presented at IEEE Transactions on Knowledge and Data Engineering, 1993a.

    Google Scholar 

  37. Agrawal, R. and Srikant, R.: Fast Algorithms for Mining Association Rules, presented at 20th Int’l Conference on Very Large Databases, Santiago, Chile, 1994.

    Google Scholar 

  38. Wang, W. and Yang, J.: Mining Sequential Patterns from Large Data Sets: Springer, 2005.

    Google Scholar 

  39. Blahut, R.: Principles and Practice of Information Theory: Addison-Wesley Publishing Company, 1987.

    Google Scholar 

  40. Quinlan, J. R. and Cameron-Jones, R. M.: FOIL: A Midterm report, presented at European Conference on Machine Learning (ECML-93), Vienna, Austria, 1993.

    Google Scholar 

  41. Liu, B., Hsu, W. and Ma, Y.: Integrating classification and association rule mining, presented at The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98)′ , New York, 1998.

    Google Scholar 

  42. Jayasinghe S, H. K. and White S.H.: Energetics, stability, and prediction of transmembrane helices., J. Mol. Biol., vol. 312, pp. 927-934, 2001.

    Article  Google Scholar 

  43. Chawla, S., Davis, J., Pandey, G. On Local Pruning of Association Rules Using Directed Hypergraphs. Proceedings of the 20th International Conference on Data Engineering, ICDE 2004: 832.

    Google Scholar 

  44. Gupta, G., Strehl, A. and Ghosh. J. Distance based clustering of association rules. In Intelligent Engineering Systems Through Artificial Neural Networks (Proceedings of ANNIE 1999), ASME Press, November, 1999., volume 9: pages 759-764.

    Google Scholar 

  45. Lele, S., Golden, B., Ozga, K. and Wasil, E. Clustering Rules Using Empirical Similarity of Support Sets Lecture Notes In Computer Science; Vol. 2226 archive, Proceedings of the 4th International Conference on Discovery Science table of contents, 2001, Pages: 447-451.

    Google Scholar 

  46. Toivonen, H., Klemettinen, M., Ronkainen, P. and Mannila. H. Pruning and grouping discovered association rules. In MLnet Workshop on Statistics, Machine Learning and Discovery in Databases, April, 1995: pages 47-52.

    Google Scholar 

  47. Han, J. and Kambr, M.: Data Mining concepts and Techniques, Higher Education Press, Morgan Kaufmann Publishers. 2001.

    Google Scholar 

  48. Wang, J. ed.: Encyclopedia of Data Warehousing and Minging, Hershey, PA: IGI, 2005, 190-195.

    Google Scholar 

  49. He, J. Hu, H. Harrison, R., Tai, P.C. and Pan, Y.: Rule Generation for Protein Secondary Structure Prediction with Support Vector Machines and Decision Tree, IEEE Transactions on NanoBioscience, Vol. 5, No. 1, March 2006, pp. 46-53.

    Article  Google Scholar 

  50. He, J. Hu, H. Harrison, R., Tai, P.C., Dong, Y. and Pan, Y : Rule Clustering and Super rule Generation for Transmembrane Segments Prediction, Proceedings of IEEE Computational Systems Bioinformatics Conference (CSB 2005), August 8-11, 2005, Califormia, USA, Poster, pp. 224-227.

    Google Scholar 

  51. Zhou, Z.-H. Rule extraction:using neural networks or for neural networks? Journal of Computer Science and Technology, 2004, 19(2), 249-253.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

He, J., Hu, Hj., Chen, B., Tai, P., Harrison, R., Pan, Y. (2008). Rule Extraction from SVM for Protein Structure Prediction. In: Diederich, J. (eds) Rule Extraction from Support Vector Machines. Studies in Computational Intelligence, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75390-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75390-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75389-6

  • Online ISBN: 978-3-540-75390-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics