Protein Data Condensation for Effective Quaternary Structure Classification

Angiulli, Fabrizio; Fionda, Valeria; Rombo, Simona E.

doi:10.1007/978-3-540-77226-2_81

Protein Data Condensation for Effective Quaternary Structure Classification

Fabrizio Angiulli¹,
Valeria Fionda² &
Simona E. Rombo¹

Conference paper

3158 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4881))

Abstract

Many proteins are composed of two or more subunits, each associated with different polypeptide chains. The number and the arrangement of subunits forming a protein are referred to as quaternary structure. The quaternary structure of a protein is important, since it characterizes the biological function of the protein when it is involved in specific biological processes. Unfortunately, quaternary structures are not trivially deducible from protein amino acid sequences. In this work, we propose a protein quaternary structure classification method exploiting the functional domain composition of proteins. It is based on a nearest neighbor condensation technique in order to reduce both the portion of dataset to be stored and the number of comparisons to carry out. Our approach seems to be promising, in that it guarantees an high classification accuracy, even though it does not require the entire dataset to be analyzed. Indeed, experimental evaluations show that the method here proposed selects a small dataset portion for the classification (of the order of the 6.43%) and that it is very accurate (97.74%).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angiulli, F.: Fast condensend nearest neighbor rule. In: Proc. of the 22nd International Conference on Machine Learning, Bonn, Germany (2005)
Google Scholar
Bairoch, A., Apweiler, R.: The swiss-prot protein sequence data bank and its new supplement trembl. Nucleic Acids Research 24(1), 21–25 (1996)
Article Google Scholar
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The pfam protein families database. Nucleic Acids Reserch 30(1), 276–280 (2002)
Article Google Scholar
Cai, Y.D., Doig, A.J.: Prediction of saccharomyces cerevisiae protein functional class from functional domain composition. Bioinformatics 20(8), 1292–1300 (2004)
Article Google Scholar
Chou, K.C., Cai, Y.D.: Predicting protein quaternary structure by pseudo amino acid composition. Proteins: Structure, Function, and Genetics 53(2), 282–289 (2003)
Article Google Scholar
Chou, K.C., Cai, Y.D.: Predicting protein structural class by functional domain composition. Biochemical and biophysical research communications 321(4), 1007–1009 (2004)
Article MathSciNet Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Inform. Th. 13(1), 21–27 (1967)
Article MATH Google Scholar
Devroye, L., Gyorfy, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
MATH Google Scholar
Fukunaga, K., Hostetler, L.D.: k-nearest-neighbor bayes-risk estimation. IEEE Transactions on Information Theory 21, 285–293 (1975)
Article MATH MathSciNet Google Scholar
Garian, R.: Prediction of quaternary structure from primary structure. Bioinformatics 17(6), 551–556 (2000)
Article Google Scholar
Kim, W.K., Park, J., Suh, J.K.: Large scale statistical prediction of protein-protein interaction by potentially interacting domain (pid) pair. In: Genome informatics. International Conference on Genome Informatics, vol. 13, pp. 42–50 (2002)
Google Scholar
Klotz, I.M., Langerman, N.R., Darnall, D.W.: Quaternary structure of proteins. Annual review of biochemistry 39, 25–62 (1970)
Article Google Scholar
Lesk, A.M.: Introduction to Protein Architecture. Oxford University Press, Oxford (2001)
Google Scholar
Meiler, J., Baker, D.: Coupled prediction of protein secondary and tertiary structure. Proceedings of the National Academy of Sciences of the United States of America 100(21), 12105–12110 (2003)
Article Google Scholar
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Article Google Scholar
Song, J., Tang, H.: Accurate classification of homodimeric vs other homooligomeric proteins using a new measure of information discrepancy. Journal of chemical information and computer sciences 44(4), 1324–1327 (2004)
Article Google Scholar
Sund, H., Weber, K.: The quaternary structure of proteins. Angewandte Chemie (International eds in English) 5(2), 231–245 (1966)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)
Article MATH Google Scholar
Wojcik, J., Schachter, V.: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics 17(1), 296–305 (2001)
Google Scholar
Yu, X., Lin, J., Shi, T., Li, Y.: A novel domain-based method for predicting the functional classes of proteins. Chinese Science Bullettin - English Edition- 49(22), 2379–2384 (2004)
Google Scholar
Yu, X., Wang, C., Li, Y.: Classification of protein quaternary structure by functional domain composition. BMC Bioinformatics 7(187) (2006)
Google Scholar
Zhang, S.W., Pan, Q., Zhang, H.C., Zhang, Y.L., Wang, H.Y.: Classification of protein quaternary structure with support vector machine. Bioinformatics 19(18), 2390–2396 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

DEIS - Università della Calabria, Via P. Bucci 41C, 87036 Rende (CS), Italy
Fabrizio Angiulli & Simona E. Rombo
Dept. of Mathematics, Via P. Bucci 31B, 87036 Rende (CS), Italy
Valeria Fionda

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Fionda
View author publications
You can also search for this author in PubMed Google Scholar
Simona E. Rombo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hujun Yin Peter Tino Emilio Corchado Will Byrne Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angiulli, F., Fionda, V., Rombo, S.E. (2007). Protein Data Condensation for Effective Quaternary Structure Classification. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. IDEAL 2007. Lecture Notes in Computer Science, vol 4881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77226-2_81

Download citation

DOI: https://doi.org/10.1007/978-3-540-77226-2_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77225-5
Online ISBN: 978-3-540-77226-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics