Abstract
Many cellular functions are carried out in compartments of the cell. The cellular localization of a protein is thus related to its function identification. This paper investigates the use of two Machine Learning techniques, Support Vector Machines (SVMs) and Decision Trees (DTs), in the protein cellular localization prediction problem. Since the given task has multiple classes and SVMs are originally designed for the solution of two class problems, several strategies for multiclass SVMs extension were investigated, including one proposed by the authors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms and Applications. Prentice Hall, Englewood Cliffs (1993)
Allwein, E.L., Shapire, R.E., Singer, Y.: Reducing Multiclass to Binary: a Unifying Approach for Margin Classifiers. In: Proc. of the 17th ICML, pp. 9–16 (2000)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Cheong, S., Oh, S.H., Lee, S.-Y.: Support Vector Machines with Binary Tree Architecture for Multi-Class Classification. Neural Information Processing - Letters and Reviews 2(3), 47–50 (2004)
Cristianini, N., Taylor, J.S.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Cui, Q., Jiang, T., Liu, B., Ma, S.: Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics 5(1), 66 (2004)
Dietterich, T.G., Bariki, G.: Solving Multiclass Learning Problems via Error-Correcting Output Codes. JAIR 2, 263–286 (1995)
Feng, Z.-P.: An overview on predicting the subcellular location of a protein. Silico Biology 2(3), 291–303 (2002)
Garg, A., Bhasin, M., Raghava, G.P.S.: Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search. J. of Biol. Chem. 280(15), 14427–14432 (2005)
Horton, P., Nakai, K.: Better Prediction of Protein Cellular Localization Sites with k-Nearest Neighbor Classifiers. In: Proc. of ISMB, vol. 5, pp. 147–152 (1997)
Hua, S., Sun, Z.: Support Vector Machine Approach for Protein Subcellular Localization Prediction. Bioinformatics 5(8), 721–728 (2001)
Kreβel, U.: Pairwise Classification and Support Vector Machines. In: Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Lorena, A.C., Carvalho, A.C.P.L.F.: Minimum spanning trees in hierarchical multiclass support vector machines generation. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 422–431. Springer, Heidelberg (2005)
Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4), 547–556 (2004)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large Margin DAGs for Multiclass Classification. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems, 12th edn., pp. 547–553. MIT Press, Cambridge (2000)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Francisco (1988)
Salzberg, S.L.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery 1, 317–328 (1997)
Schwenker, F.: Hierarchical Support Vector Machines for Multi-Class Pattern Recognition. In: Proc. of the 4th Int. Conf. on Knowledge-based Intell. Eng. Syst. and Allied Tech., pp. 561–565. IEEE Computer Society Press, Los Alamitos (2000)
University of California Irvine: UCI benchmark repository - a huge collection of artificial and real-world datasets, http://www.ics.uci.edu/~mlearn
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
Vural, V., Dy, J.G.: A Hierarchical Method for Multi-Class Support Vector Machines. In: Proc. of the 21st ICML, pp. 831–838 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lorena, A.C., de Carvalho, A.C.P.L.F. (2005). Protein Cellular Localization with Multiclass Support Vector Machines and Decision Trees. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_6
Download citation
DOI: https://doi.org/10.1007/11532323_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28008-8
Online ISBN: 978-3-540-31861-3
eBook Packages: Computer ScienceComputer Science (R0)