Skip to main content

Protein Cellular Localization with Multiclass Support Vector Machines and Decision Trees

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3594))

Abstract

Many cellular functions are carried out in compartments of the cell. The cellular localization of a protein is thus related to its function identification. This paper investigates the use of two Machine Learning techniques, Support Vector Machines (SVMs) and Decision Trees (DTs), in the protein cellular localization prediction problem. Since the given task has multiple classes and SVMs are originally designed for the solution of two class problems, several strategies for multiclass SVMs extension were investigated, including one proposed by the authors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms and Applications. Prentice Hall, Englewood Cliffs (1993)

    Google Scholar 

  2. Allwein, E.L., Shapire, R.E., Singer, Y.: Reducing Multiclass to Binary: a Unifying Approach for Margin Classifiers. In: Proc. of the 17th ICML, pp. 9–16 (2000)

    Google Scholar 

  3. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  4. Cheong, S., Oh, S.H., Lee, S.-Y.: Support Vector Machines with Binary Tree Architecture for Multi-Class Classification. Neural Information Processing - Letters and Reviews 2(3), 47–50 (2004)

    Google Scholar 

  5. Cristianini, N., Taylor, J.S.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  6. Cui, Q., Jiang, T., Liu, B., Ma, S.: Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics 5(1), 66 (2004)

    Article  Google Scholar 

  7. Dietterich, T.G., Bariki, G.: Solving Multiclass Learning Problems via Error-Correcting Output Codes. JAIR 2, 263–286 (1995)

    MATH  Google Scholar 

  8. Feng, Z.-P.: An overview on predicting the subcellular location of a protein. Silico Biology 2(3), 291–303 (2002)

    Google Scholar 

  9. Garg, A., Bhasin, M., Raghava, G.P.S.: Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search. J. of Biol. Chem. 280(15), 14427–14432 (2005)

    Article  Google Scholar 

  10. Horton, P., Nakai, K.: Better Prediction of Protein Cellular Localization Sites with k-Nearest Neighbor Classifiers. In: Proc. of ISMB, vol. 5, pp. 147–152 (1997)

    Google Scholar 

  11. Hua, S., Sun, Z.: Support Vector Machine Approach for Protein Subcellular Localization Prediction. Bioinformatics 5(8), 721–728 (2001)

    Article  Google Scholar 

  12. Kreβel, U.: Pairwise Classification and Support Vector Machines. In: Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  13. Lorena, A.C., Carvalho, A.C.P.L.F.: Minimum spanning trees in hierarchical multiclass support vector machines generation. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 422–431. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4), 547–556 (2004)

    Article  Google Scholar 

  15. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  16. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large Margin DAGs for Multiclass Classification. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems, 12th edn., pp. 547–553. MIT Press, Cambridge (2000)

    Google Scholar 

  17. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  18. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  19. Salzberg, S.L.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1, 317–328 (1997)

    Article  Google Scholar 

  20. Schwenker, F.: Hierarchical Support Vector Machines for Multi-Class Pattern Recognition. In: Proc. of the 4th Int. Conf. on Knowledge-based Intell. Eng. Syst. and Allied Tech., pp. 561–565. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  21. University of California Irvine: UCI benchmark repository - a huge collection of artificial and real-world datasets, http://www.ics.uci.edu/~mlearn

  22. Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)

    MATH  Google Scholar 

  23. Vural, V., Dy, J.G.: A Hierarchical Method for Multi-Class Support Vector Machines. In: Proc. of the 21st ICML, pp. 831–838 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lorena, A.C., de Carvalho, A.C.P.L.F. (2005). Protein Cellular Localization with Multiclass Support Vector Machines and Decision Trees. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_6

Download citation

  • DOI: https://doi.org/10.1007/11532323_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28008-8

  • Online ISBN: 978-3-540-31861-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics