Abstract
Gene function prediction and protein function prediction are complex classification problems where the functional classes are structured according to a predefined hierarchy. To solve these problems, we propose an extended local hierarchical Naive Bayes classifier, where a binary classifier is built for each class in the hierarchy. The extension to conventional local approaches is that each classifier considers both the parent and child classes of the current class. We have evaluated the proposed approach on eight protein function and ten gene function hierarchical classification datasets. The proposed approach achieved somewhat better predictive accuracies than a global hierarchical Naive Bayes classifier.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sleator, R.D., Walsh, P.: An overview of in silico protein function prediction. Archives of Microbiology 192(3), 151–155 (2010)
Gerlt, J.A., Babbitt, P.C.: Can sequence determine function? 1 (2000)
Syed, U., Yona, G.: Using a mixture of probabilistic decision trees for direct prediction of protein function. In: Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology, RECOMB 2003, pp. 289–300. ACM, New York (2003)
Pavlidis, P., Cai, J., Weston, J., Noble, W.S.: Learning gene functional classifications from multiple data types. Journal of Computational Biology 9, 401–411 (2002)
Suhai, S., Glatting, K.H., Eils, R., Schubert, F., Moormann, J., König, R., Vinayagam, A.: Applying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics 5 (2004)
Jung, J., Thon, M.R.: Automatic annotation of protein functional class from sparse and imbalanced data sets. In: Dalkilic, M.M., Kim, S., Yang, J. (eds.) VDMB 2006. LNCS (LNBI), vol. 4316, pp. 65–77. Springer, Heidelberg (2006)
Silla Jr., C.N., Freitas, A.A.: A global-model naive bayes approach to the hierarchical prediction of protein functions. In: Proc. of the 2009 Ninth IEEE International Conference on Data Mining, pp. 992–997. IEEE Computer Society (2009)
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1-2), 31–72 (2011)
Wu, F., Zhang, J., Honavar, V.G.: Learning classifiers using hierarchically structured class taxonomies. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 313–320. Springer, Heidelberg (2005)
Barutcuoglu, Z., DeCoro, C.: Hierarchical shape classification using bayesian aggregation. In: Proc. of the IEEE International Conference on Shape Modeling and Applications, SMI 2006, p. 44 (2006)
Valentini, G.: True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(3), 832–847 (2011)
Silla Jr., C.N.: Novel Approaches for Hierarchical Classification with Case Studies in Protein Function Prediction. PhD thesis, University of Kent (2011)
Grisham, C.M., Garrett, R.H.: Biochemistry. Saunders College Publishers, Philadelphia (1999)
Venkatakrishnan, A.J., Deupi, X., Lebon, G., Tate, C.G., Schertler, G.F., Babu, M.M.: Molecular signatures of g-protein-coupled receptors. Nature 494, 185–194 (2013)
Costa, E.P., Lorena, A.C., Carvalho, A.C.P.L.F., Freitas, A.A., Holden, N.: Comparing several approaches for hierarchical classification of proteins with decision trees. In: Sagot, M.-F., Walter, M.E.M.T. (eds.) BSB 2007. LNCS (LNBI), vol. 4643, pp. 126–137. Springer, Heidelberg (2007)
Holden, N., Freitas, A.A.: Improving the performance of hierarchical classification with swarm intelligence. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008. LNCS, vol. 4973, pp. 48–60. Springer, Heidelberg (2008)
Mewes, H.W., Heumann, K., Kaps, A., Mayer, K.F.X., Pfeiffer, F., Stocker, S., Frishman, D.: Mips: a database for genomes and protein sequences. Nucleic Acids Research 27(1), 44–48 (1999)
Clare, A., King, R.D.: Predicting gene function in saccharomyces cerevisiae. In: Proc. of the European Conference on Computational Biology, pp. 42–49 (2003)
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)
Kiritchenko, S., Matwin, S., Famili, A.F.: Functional annotation of genes using hierarchical text categorization. In: Proc. of the BioLINK SIG: Linking Literature, Information and Knowledge for Biology (2005)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers, USA (2011)
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, New York (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
de Campos Merschmann, L.H., Freitas, A.A. (2013). An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2013. Lecture Notes in Computer Science, vol 8057. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40131-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-40131-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40130-5
Online ISBN: 978-3-642-40131-2
eBook Packages: Computer ScienceComputer Science (R0)