ABSTRACT
Genome wide protein networks have become reality in recent years due to high throughput methods for detecting protein interactions. Recent studies show that a networked representation of proteins provides a more accurate model of biological systems and processes compared to conventional pair-wise analyses. Complementary to the availability of protein networks, various graph analysis techniques have been proposed to mine these networks for pathway discovery, function assignment, and prediction of complex membership. In this paper, we propose using random walks on graphs for the complex/pathway membership problem. We evaluate the proposed technique on three different probabilistic yeast networks using a benchmark dataset of 27 complexes from the MIPS complex catalog database and 10 pathways from the KEGG pathway database. Furthermore, we compare the proposed technique to two other existing techniques both in terms of accuracy and running time performance, thus addressing the scalability issue of such analysis techniques for the first time. Our experiments show that the random walk technique achieves similar or better accuracy with more than 1,000 times speed-up compared to the best competing technique.
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, May 2004.Google ScholarCross Ref
- G. D. Bader and C. W. V. Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4(2), 2003.Google Scholar
- J. S. Bader. Greedily building protein networks with confidence. Bioinformatics, 19(15):1869--1874, 2003.Google ScholarCross Ref
- P. M. Bowers, M. Pellegrini, M. J. Thompson, J. Fierro, T. O. Yeates, and D. Eisenberg. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biology, 5(5):R35, 2004.Google ScholarCross Ref
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30:107--117, 1998. Google ScholarDigital Library
- A. C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A. M. Michon, and C. M. Cruciat. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415:141--147, 2002.Google ScholarCross Ref
- Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, and K. Boutilier. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415:180--183, 2002.Google ScholarCross Ref
- T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci., 98:4569--4574, 2001.Google ScholarCross Ref
- R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N. J. Krogan, S. Chung, A. Emili, M. Snyder, J. F. Greenblatt, and M. Gerstein. A bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 302:449--453, October 2003.Google ScholarCross Ref
- M. Kanehisa, S. Goto, S. Kawashima, and A. Nakaya. The KEGG databases at GenomeNet. Nucleic Acids Research, 30:42--46, 2002.Google ScholarCross Ref
- G. R. G. Lanckriet, M. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble. Kernel-based data fusion and its application to protein function prediction in yeast. In Proceedings of PSB, 2004.Google Scholar
- I. Lee, S. V. Date, A. T. Adai, and E. M. Marcotte. A probabilistic functional network of yeast genes. Science, 306:1555--1558, November 2004.Google ScholarCross Ref
- S. Letovsky and S. Kasif. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics, 19:i197--i204, 2003.Google ScholarCross Ref
- L. Lovasz. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2:353--398, 1996.Google Scholar
- H. W. Mewes, C. Amid, R. Arnold, D. Frishman, U. Guldener, G. Mannhaupt, M. Munsterkotter, P. Pagel, N. Strack, V. Stumpflen, J. Warfsmann, and A. Ruepp. MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Research, 32:D41--D44, 2004.Google ScholarCross Ref
- B. Schoelkopf, K. Tsuda, and J.-P. Vert, editors. Kernel methods in computational biology. MIT Press, 2004.Google ScholarCross Ref
- J. Scott, T. Ideker, R. M. Karp, and R. Sharan. Efficient algorithms for detecting signaling pathways in protein interaction networks. In Proceedings of RECOMB, 2005. Google ScholarDigital Library
- K. Tsuda and W. S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(S1):i326--i333, 2004. Google ScholarDigital Library
- P. Uetz, G. Cagney, T. A. Mansfield, R. Judson, J. R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, and P. Pochart. A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature, 403:623--627, 2000.Google ScholarCross Ref
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput, 8:410--421, 1979.Google ScholarDigital Library
- C. von Mering, L. J. Jensen, B. Snel, S. D. Hooper, M. Krupp, M. Foglierini, N. Jouffre, M. A. Huynen, and P. Bork. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Research, 33:D433--D437, 2005.Google ScholarCross Ref
- C. von Mering, R. Krause, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399--403, May 2002.Google ScholarCross Ref
- J. Weston, A. Elisseeff, D. Zhou, C. S. Leslie, and W. S. Noble. Protein ranking: From local to global structure in the protein similarity network. Proc. Nat. Acad. Sci., 101(17):6569--6563, 2004.Google ScholarCross Ref
- Y. Yamanishi, J.-P. Vert, and M. Kanehisa. Protein network inference from multiple genomic data: a supervised approach. Bioinformatics, 20(S1):i363--i370, 2004. Google ScholarDigital Library
Index Terms
- Analysis of protein-protein interaction networks using random walks
Recommendations
Analyzing incomplete biological pathways using network motifs
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingIt is widely accepted that existing knowledge about the structure of many biological pathways is incomplete, and that uncovering missing proteins in a biological pathway can help guide targeted therapy, drug design, and drug discovery. Current ...
Reconstruction and crosstalk of protein-protein interaction networks of Wnt and Hedgehog signaling in Drosophila melanogaster
Graphical abstract.Display Omitted Highlights The newly reconstructed Wnt/ -catenin signaling network in Drosophila melanogaster consists of 656 proteins and 1253 interactions whereas Hedgehog signaling network consists of 523 proteins and 945 ...
Comments