Abstract
The definition of the automatic protein function means designating the function with the automation by utilizing the data that already revealed unknown protein function. The demand for analysis on the sequencing technology such as the next generation genome analysis (NGS) and the subsequent genome are on the rise; thus, the need for the method of predicting the protein function automatically has been more and more highlighted. As for the existing methods, the studies on the definition of function between the similar species based on the similarities of sequence have been primarily conducted. However, this paper aims to designate by automatically predicting the function of genome by utilizing InterPro (IPR) that can represent the properties of the protein family, which similarly groups the protein function. Moreover, the gene ontology (GO), which is the controlled vocabulary to describe the protein function comprehensively, is to be used. As for the data used in the experiment, the analysis on properties was conducted in the sparse state that is deflected to one side. Thus, this paper aims to analyze the prediction method for protein function automatically through selecting the features, assigning the data processing and weights and applying a variety of classification methods to overcome that property.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res 32: D262–D266
Chang CC, Lin CJ (2011). LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chawla N, Bowyer K, Hall L, Kegelmeyer P (2002) SMOTE: synthetic minority over-sampling technique. JAIR 16:321–357
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676 Epub, Aug 4 2005
Freund Y, Schapire R (1996) A short introduction to boosting. J Japan Soc Artif Intell 14(5):771–780
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hunter S, Jones P, Mitchell A et al (2011) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312
John CP Sequential minimal optimization: a fast algorithm for training support vector machines
Koski LB, Gray MW, Lang BF, Burger G (2005) AutoFACT: an automatic functional annotation and classification tool. BMC Bioinf 6:151
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the fourteenth international conference on machine learning, pp 179–186
Martin DM, Berriman M, Barton GJ (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinf 5:178
Quevillon E, Silventoinen V, Pillai S et al (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33:W116–W120
Shahib A Al, Breitling R, Gilbert D (2005) Feature selection and the class imbalance problem in predicting protein function from sequence. Appl Bioinf 4(3):195–203
Zehetner G (2003) OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res 31(13):803–3799
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2063006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Jung, J., Lee, H.K., Yi, G. (2014). Functional Annotation of Proteins by a Novel Method Using Weight and Feature Selection. In: Park, J., Zomaya, A., Jeong, HY., Obaidat, M. (eds) Frontier and Innovation in Future Computing and Communications. Lecture Notes in Electrical Engineering, vol 301. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-8798-7_88
Download citation
DOI: https://doi.org/10.1007/978-94-017-8798-7_88
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-8797-0
Online ISBN: 978-94-017-8798-7
eBook Packages: EngineeringEngineering (R0)