Abstract
The increased availability of data describing biological interactions provides important clues on how complex chains of genes and proteins interact with each other. Most previous approaches either restrict their attention to analyzing simple substructures such as paths or trees in these graphs, or use heuristics that do not provide performance guarantees when general substructures are analyzed. We investigate a formulation to model pathway structures directly and give a probabilistic algorithm to find an optimal path structure in \(O(4^{k}n^{2t}k^{t+\log(t+1)+2.92}t^{2})\) time and \(O(n^{t}k\log k+m)\) space, where n and m are respectively the number of vertices and the number of edges in the given network, k is the number of vertices in the path structure, and t is the maximum number of vertices (i.e., "width") at each level of the structure. Even for the case t = 1 which corresponds to finding simple paths of length k, our time complexity \(4^{k}n^{O(1)}\) is a significant improvement over previous probabilistic approaches. To allow for the analysis of multiple pathway structures, we further consider a variant of the algorithm that provides probabilistic guarantees for the top suboptimal path structures with a slight increase in time and space. We show that our algorithm can identify pathway structures with high sensitivity by applying it to protein interaction networks in the DIP database.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Lu, S., Zhang, F., Chen, J. et al. Finding Pathway Structures in Protein Interaction Networks. Algorithmica 48, 363–374 (2007). https://doi.org/10.1007/s00453-007-0155-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-007-0155-7