Abstract
Protein-protein interaction (PPI) networks are valuable biological source of data which contain rich information useful for protein function prediction. The PPI networks face data quality challenges like noise in the form of false positive edges and incompleteness in the form of missing biologically valued edges. These issues can be handled by enhancing data quality through graph transformations for improved protein function prediction. We proposed an improved method to extract similar proteins based on the notion of relaxed neighborhood. The proposed method can be applied to carry out graph transformation of PPI network datasets to improve the performance of protein function prediction task by adding biologically important protein interactions, removing dissimilar interactions and increasing reliability score of the interactions. By preprocessing PPI network datasets with the proposed methodology, the experiments conducted on both un-weighted and weighted PPI network datasets show that, the proposed methodology enhances the data quality and improves prediction accuracy over other approaches. The results indicate that the proposed approach could utilize underutilized knowledge, such as distant relationships embedded in the PPI graph.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Technical Report, Department of Computer Science and Engineering,University of Minnesota. 06-028 (2006)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007)
Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2007)
Legrain, P., Wojcik, J., Gauthier, J.-M.: Protein protein interaction maps: a lead towards cellular functions. Trends in Genetics 17(6), 352 (2001)
Breitkreutz, B.J., Stark, C., Tyers, M.: The GRID: the General Repository for Interaction Datasets. Genome Biology 4(3), R23 (2003)
Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research 30(1), 303–305 (2002)
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nature Biotechnology 18, 1257–1261 (2000)
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21 (2005)
Pandey, G., Steinbach, M., Gupta, R., Garg, T., Kumar, V.: Association analysis-based transformations for protein interaction networks: a function prediction case study. In: KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 549, pp. 540–549 (2007)
Yona, G., Dirks, W., Rahman, S., Lin, D.M.: Effective similarity measures for expression profiles. Bioinformatics 22(13), 1616–1622 (2006)
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., et al.: Comparative assessment of large scale datasets of protein protein interactions. Nature 417, 399–403 (2002)
Hart, G.T., Ramani, A.K., Marcotte, E.: How complete are current yeast and human protein interaction networks. Genome Biology 7, 120 (2006)
Deng, M., Sun, F., Chen, T.: Assessment of the reliability of protein protein interactions and protein function prediction. In: Pac. Symp. Biocomputing, pp. 140–151 (2003)
de Silva, E., Thorne, T., Ingram, P., Agrafioti, I., Swire, J., et al.: The effects of incomplete protein interaction data on structural and evolutionary inferences. BMC Biology 4, 39 (2006)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)
Brun, C., Chevenet, F., Martin, D., Wojcik, J., Guenoche, A., et al.: Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology 5, R6 (2003)
Pandey, G., Manocha, S., Atluri, G., Kumar, V.: Enhancing the functional content of protein interaction networks. CoRR abs/1210.6912 (2012)
Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22, 1623–1630 (2006)
Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2003)
Yu, G., Domeniconi, C., Rangwala, H., Zhang, G., Yu, Z.: Transductive multi-label ensemble classification for protein function prediction, pp. 1077–1085. KDD (2012)
Lin, C., Jiang, D., Zhang, A.: Prediction of protein function using common-neighbors in protein-protein interaction networks. In: Proc. IEEE Symposium on BionInformatics and BioEngineering (BIBE), pp. 251–260 (2006)
Reddy, P.K., Kitsuregawa, M.: An Approach to Relate the Web Communities through Bipartite Graphs. In: WISE 2001, pp. 301–310 (2001)
Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32(18), 5539–5545 (2004)
West, D.B.: Introduction to Graph Theory. Prentice Hall (2001)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging Cyber communities. In: 8th WWW Conference (May 1999)
Garfield, E.: Cocitation analysis as a tool in journal evaluation. Science, 178 (1772)
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web: experiments and models. In: 9th International WWW Conference (May 2000)
Scott, J.: Social Network analysis: a handbook. SAGE Publications (1991)
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D.: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics (2002)
Camoglu, O., Can, T., Singh, A.K.: Integrating multi-attribute similarity networks for robust representation of the protein space. Bioinformatics Journal (July 2006)
Krogan, N.J., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Satheesh Kumar, D., Krishna Reddy, P., Parekh, N. (2014). Relaxed Neighbor Based Graph Transformations for Effective Preprocessing: A Function Prediction Case Study. In: Srinivasa, S., Mehta, S. (eds) Big Data Analytics. BDA 2014. Lecture Notes in Computer Science, vol 8883. Springer, Cham. https://doi.org/10.1007/978-3-319-13820-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-13820-6_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13819-0
Online ISBN: 978-3-319-13820-6
eBook Packages: Computer ScienceComputer Science (R0)