Rule Extraction from SVM for Protein Structure Prediction

He, Jieyue; Hu, Hae-jin; Chen, Bernard; Tai, Phang C.; Harrison, Rob; Pan, Yi

doi:10.1007/978-3-540-75390-2_10

Jieyue He³,
Hae-jin Hu⁴,
Bernard Chen⁴,
Phang C. Tai⁵,
Rob Harrison⁴ &
…
Yi Pan⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 80))

1146 Accesses
4 Citations

Summary

In recent years, many researches have focused on improving the accuracy of protein structure prediction, and many significant results have been achieved. However, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. The support vector machines (SVMs) have shown better performance than most traditional machine learning approaches in a variety of application areas. However, the SVMs are still black box models. They do not produce comprehensible models that account for the predictions they make. To overcome this limitation, in this chapter, we present two new approaches of rule generation for understanding protein structure prediction. Based on the strong generalization ability of the SVM and the interpretation of the decision tree, one approach combines SVMs with decision trees into a new algorithm called SVM_DT. Another method combines SVMs with association rule (AR) based scheme called SVM_PCPAR. We also provide the method of rule aggregation for a large number of rules to produce the super rules by using conceptual clustering. The results of the experiments for protein structure prediction show that not only the comprehensibility of SVM_DT and SVM_PCPAR are much better than that of SVMs, but also that the test accuracy of these rules is comparable. We believe that SVM_DT and SVM_PCPAR can be used for protein structure prediction, and understanding the prediction as well. The prediction and its interpretation can be used for guiding biological experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barakat, N. and Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. The third Conference on Neuro-Computing and Evolving Intelligence (NCEI’04) (2004).
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167 (1998).
Article Google Scholar
Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002).
Google Scholar
Chandonia, J.M. and Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins (1999) 35, 293-306.
Article Google Scholar
Chen, C.P., Kernytsky, A. and Rost, B.: Transmembrane helix predictions revisited. Protein Science, vol. 11, (2002), pp. 2774-2791.
Article Google Scholar
Cho, Y.H., Kim, J.K. and Kim, S.H.: A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, Volume 23, Issue 3, 1, (2002), 329-342.
Article Google Scholar
Sohn, S. Y. and Moon, T.H.: Decision Tree based on data envelopment analysis for effective technology commercialization. Expert Systems with Applications, Volume 26, Issue 2, (2004), 279-284.
Article Google Scholar
Henikoff, S. and Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915-10919 (1992).
Article Google Scholar
Hu, H., Pan, Y., Harrison, R. and Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience, Vol. 3, No. 4, Dec. 2004, pp. 265-271.
Article Google Scholar
Hua, S. and Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. (2001) 308: 397-407.
Article Google Scholar
Joachims, T.: SVMlight. http://www.cs.cornell.edu/People/tj/svm light/ (2002).
Kim, H. and Park, H.: Protein Secondary Structure Prediction Based on an Improved Sup port Vector Machines Approach (2002).
Google Scholar
Lim, T.S., Loh, W.Y. and Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty Tree Old and New Classification Algorithm. Machine Learning, Vol. 40, no. 3, pp. 203-228, Sept. 2000.
Article MATH Google Scholar
Lin, S., Patel, S. and Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics, 2003.
Google Scholar
Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997).
Google Scholar
Lent, B., Swami, A. N. and Widom, J. Clustering association rules. In ICDE, 1997, pages 20-231.
Google Scholar
Noble, W.S.: Kernel Methods in Computational Biology. B. Schoelkopf, K. Tsuda and J.-P. Vert, ed. MIT Press (2004) 71-92.
Google Scholar
Núñez, H., Angulo, C. and Catala, A.: Rule-extraction from Support Vector Machines. The European Symposium on Artifical Neural Networks, Burges, ISBN 2-930307-02-1, 2002, pp. 107-112.
Google Scholar
Kretschmann, E., Fleischmann, W. and Apweiler, R.: Automatic Rule Generation for protein Annotation with the C4.5 Data Mining Algorithm Applied on SWISS-PROT. Bioinformatics, (2001), 17(10).
Google Scholar
Quinlan, J.R.: C4.5:Programs for Machine Learning. San Mateo, Calif: Morgan Kaufmann, 1993.
Google Scholar
Rost, B. and Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. (1993) 232, 584-599.
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Inc., New York (1998).
MATH Google Scholar
Yang, Z.R. and Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5), 2004.
Google Scholar
Sikder, A.R. and Zomaya, A.Y.: An “overview of protein-folding techniques: issues and perspectives,” Int. J. Bioinformatics Research and Applications, Vol. 1, issure 1, pp. 121-143, 2005.
Article Google Scholar
He, J., Hu, H., Harrison, R., Tai, P.C. and Y. Pan, “Transmembrane segments prediction and understanding using support vector machine and decision tree,” Expert Systems with Applications, Special Issue on Intelligent Bioinformatics Systems, vol. 30, pp. 64-72, 2006.
Google Scholar
Andrews, R., Diederich, J. and Tickle, A.: A Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks. Knowledge-Based Systems (1995), 8(6), pp. 373-389.
Article Google Scholar
Tickle, A., Andrews, R., Mostefa, G. and Diederich, J.: The Truth will come to light: Directions and Challenges in Extracting the Knowledge Embedded within Trained Artificial Neural Networks. IEEE Transactions on Neural Networks, (1998), 9(6), pp. 1057-1068.
Article Google Scholar
Zhou., Z.-H. and Jiang, Y.: NeC4.5.: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, (2004), 16(6): 770-773.
Article MathSciNet Google Scholar
Chen, C.P., Kernytsky, A. and Rost, B.: Transmembrane helix predictions revisited. Protein Science, vol. 11, (2002), pp. 2774-2791.
Article Google Scholar
Möller, S., Kriventseva, Apweiler, E.: V. and R.: A collection of well characterized integral membrane proteins. Bioinformatics, vol. 16, (2000), pp. 1159-1160.
Article Google Scholar
Jones, D. T.: “Protein Secondary Structure Prediction Based on Position-specific Scoring Matrix,” J. Mol. Biol, vol. 292, (1999), pp. 195-202.
Article Google Scholar
Wang, K., Zhou, S. and Y. He, “Growing Decision Trees On Support-Less Association Rules,” presented at Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), Boston, MA, 2000.
Google Scholar
Hu, H., Wang, H., Harrison, R., P.C. Tai, and Y. Pan, “Understanding the Prediction of Transmembrane Proteins by Support Vector Machine using Association Rule Mining,” presented at IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB ’07), Honolulu, Hawaii, 2007.
Google Scholar
Yin, X. and Han, J. “CPAR: Classification based on Predictive Association Rules,” presented at SIAM Int. Conf. on Data Mining (SDM’03), San Fransisco, CA, 2003.
Google Scholar
Zhang, C. and Zhang, S.: Association Rule Mining: Models and Algorithms: Springer-Verlag Berlin and Heidelberg GmbH & Co. K, 2002.
Google Scholar
Agrawal, R., Imielinski, T. and A. Swami: “Database mining: A performance perspective,” presented at IEEE Transactions on Knowledge and Data Engineering, 1993a.
Google Scholar
Agrawal, R. and Srikant, R.: Fast Algorithms for Mining Association Rules, presented at 20th Int’l Conference on Very Large Databases, Santiago, Chile, 1994.
Google Scholar
Wang, W. and Yang, J.: Mining Sequential Patterns from Large Data Sets: Springer, 2005.
Google Scholar
Blahut, R.: Principles and Practice of Information Theory: Addison-Wesley Publishing Company, 1987.
Google Scholar
Quinlan, J. R. and Cameron-Jones, R. M.: FOIL: A Midterm report, presented at European Conference on Machine Learning (ECML-93), Vienna, Austria, 1993.
Google Scholar
Liu, B., Hsu, W. and Ma, Y.: Integrating classification and association rule mining, presented at The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98)′ , New York, 1998.
Google Scholar
Jayasinghe S, H. K. and White S.H.: Energetics, stability, and prediction of transmembrane helices., J. Mol. Biol., vol. 312, pp. 927-934, 2001.
Article Google Scholar
Chawla, S., Davis, J., Pandey, G. On Local Pruning of Association Rules Using Directed Hypergraphs. Proceedings of the 20th International Conference on Data Engineering, ICDE 2004: 832.
Google Scholar
Gupta, G., Strehl, A. and Ghosh. J. Distance based clustering of association rules. In Intelligent Engineering Systems Through Artificial Neural Networks (Proceedings of ANNIE 1999), ASME Press, November, 1999., volume 9: pages 759-764.
Google Scholar
Lele, S., Golden, B., Ozga, K. and Wasil, E. Clustering Rules Using Empirical Similarity of Support Sets Lecture Notes In Computer Science; Vol. 2226 archive, Proceedings of the 4th International Conference on Discovery Science table of contents, 2001, Pages: 447-451.
Google Scholar
Toivonen, H., Klemettinen, M., Ronkainen, P. and Mannila. H. Pruning and grouping discovered association rules. In MLnet Workshop on Statistics, Machine Learning and Discovery in Databases, April, 1995: pages 47-52.
Google Scholar
Han, J. and Kambr, M.: Data Mining concepts and Techniques, Higher Education Press, Morgan Kaufmann Publishers. 2001.
Google Scholar
Wang, J. ed.: Encyclopedia of Data Warehousing and Minging, Hershey, PA: IGI, 2005, 190-195.
Google Scholar
He, J. Hu, H. Harrison, R., Tai, P.C. and Pan, Y.: Rule Generation for Protein Secondary Structure Prediction with Support Vector Machines and Decision Tree, IEEE Transactions on NanoBioscience, Vol. 5, No. 1, March 2006, pp. 46-53.
Article Google Scholar
He, J. Hu, H. Harrison, R., Tai, P.C., Dong, Y. and Pan, Y : Rule Clustering and Super rule Generation for Transmembrane Segments Prediction, Proceedings of IEEE Computational Systems Bioinformatics Conference (CSB 2005), August 8-11, 2005, Califormia, USA, Poster, pp. 224-227.
Google Scholar
Zhou, Z.-H. Rule extraction:using neural networks or for neural networks? Journal of Computer Science and Technology, 2004, 19(2), 249-253.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, NJ, 210096, China
Jieyue He
Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
Hae-jin Hu, Bernard Chen, Rob Harrison & Yi Pan
Department of Biology, Georgia State University, Atlanta, GA, 30303-4110, USA
Phang C. Tai

Authors

Jieyue He
View author publications
You can also search for this author in PubMed Google Scholar
Hae-jin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Chen
View author publications
You can also search for this author in PubMed Google Scholar
Phang C. Tai
View author publications
You can also search for this author in PubMed Google Scholar
Rob Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering School of Medicine, Central Clinical Division, The University of Queensland, Brisbane, Q 4072, Australia
Joachim Diederich (Honorary Professor) (Honorary Professor)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

He, J., Hu, Hj., Chen, B., Tai, P., Harrison, R., Pan, Y. (2008). Rule Extraction from SVM for Protein Structure Prediction. In: Diederich, J. (eds) Rule Extraction from Support Vector Machines. Studies in Computational Intelligence, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75390-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-75390-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75389-6
Online ISBN: 978-3-540-75390-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics