Abstract
Multilabel classification is an important problem in bioinformatics and Machine Learning. In a conventional classification problem, examples belong to just one among many classes. When an example can simultaneously belong to more than one class, the classification problem is named multilabel classification problem. Protein function classification is a typical example of multilabel classification, since a protein may have more than one function. This paper describes the main characteristics of some multilabel classification methods and applies five methods to protein classification problems. For an experimental comparison of these methods, traditional machine learning techniques are used. The paper also compares different evaluation metrics used in multilabel problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tsoumakas, G., Katakis, I.: Multi label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS, vol. 2902, pp. 435–444. Springer, Heidelberg (2003)
Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual environment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 140–151. Springer, Heidelberg (2003)
Luo, X., Zincir-Heywood, N.A.: Evaluation of two systems on multi-class multi-label document classification. In: International Syposium on Methodologies for Intelligent Systems, pp. 161–169 (2005)
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. The IEEE Computational Intelligence Society (2005)
Elisseeff, A.E., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press, Cambridge (2001)
Alves, R., Delgado, M., Freitas, A.: Multi-label hierarchical classification of protein functions with artificial immune systems. In: Advances in Bioinformatics and Computational Biology, pp. 1–12 (2008)
Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005)
Karalic, A., Pirnat, V.: Significance level based multiple tree classification. Informatica 5 (1991)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)
Shen, X., Boutell, M., Luo, J., Brown, C.: Multi-label machine learning and its application to semantic scene classification. In: International Symposium on Electronic Imaging, San Jose, CA, January 2004, pp. 18–22 (2004)
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Saridis, G.: Parameter estimation: Principles and problems. Automatic Control, IEEE Transactions on 28(5), 634–635 (1983)
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. In: Machine Learning, pp. 135–168 (2000)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Advances in Knowledge Discovery and Data Mining, pp. 22–30 (2004)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Cohen, W.W.: Fast effective rule induction. In. Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2-3), 131–163 (1997)
Tsoumakas, G., Friberg, R., Spyromitros-Xioufis, E., Katakis, I., Vilcek, J.: Mulan software - java classes for multi-label classification (May 2008), http://mlkd.csd.auth.gr/multilabel.html#Software
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Abdi, H.: Bonferroni and Sidak corrections for multiple comparisons. Encyclopedia of Measurement and Statistics, pp. 175–208. Sage, Thousand Oaks (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cerri, R., da Silva, R.R.O., de Carvalho, A.C.P.L.F. (2009). Comparing Methods for Multilabel Classification of Proteins Using Machine Learning Techniques. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2009. Lecture Notes in Computer Science(), vol 5676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03223-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-03223-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03222-6
Online ISBN: 978-3-642-03223-3
eBook Packages: Computer ScienceComputer Science (R0)