Skip to main content
Log in

Threshold-Based Shortest Path Query over Large Correlated Uncertain Graphs

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With the popularity of uncertain data, queries over uncertain graphs have become a hot topic in the database community. As one of the important queries, the shortest path query over an uncertain graph has attracted much attention of researchers due to its wide applications. Although there are some efficient solutions addressing this problem, all existing models ignore an important property existing in uncertain graphs: the correlation among the edges sharing the same vertex. In this paper, we apply Markov network to model the hidden correlation in uncertain graphs and compute the shortest path. Unfortunately, calculating the shortest path and corresponding probability over uncertain graphs modeled by Markov networks is a #P-hard problem. Thus, we propose a filtering-and-verification framework to accelerate the queries. In the filtering phase, we design a probabilistic shortest path index based on vertex cuts and blocks of a graph. We find a series of upper bounds and prune the vertices and edges whose upper bounds of the shortest path probability are lower than the threshold. By carefully picking up the blocks and vertex cuts, the index is optimized to have the maximum pruning capability, so that we can filter a large number of vertices which make no contribution to the final shortest path query results. In the verification phase, we develop an efficient sampling algorithm to determine the final query answers. Finally, we verify the efficiency and effectiveness of our solutions with extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yuan Y, Chen L, Wang G. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In Proc. the 15th DASFAA, Apr. 2010, pp.155-170.

  2. Asthana S, King O D, Gibbons F D, Roth F P. Predicting protein complex membership using probabilistic network reliability. Genome Research, 2004, 14(6): 1170-1175.

    Article  Google Scholar 

  3. Nierman A, Jagadish H. ProTDB: Probabilistic data in XML. In Proc. the 28th VLDB, Aug. 2002, pp.646-657.

  4. Adar E, Ré C. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15-22.

  5. Lian X, Chen L. Efficient query answering in probabilistic RDF graphs. In Proc. ACM SIGMOD International Conference on Management of Data, May 2011, pp.157-168.

  6. Zou L, Peng P, Zhao D. Top-K possible shortest path query over a large uncertain graph. In Proc. the 12th Int. Conf. Web Information System Engineering, Oct. 2011, pp.72-86.

  7. Tao Y, Sheng C, Pei J. On k-skip shortest paths. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.421-432.

  8. Yao B, Tang M, Li F. Multi-approximate-keyword routing in GIS data. In Proc. the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Nov. 2011, pp.201-210.

  9. Fan W, Wang X, Wu Y. Performance guarantees for distributed reachability queries. Proceedings of the VLDB Endowment, 2012, 5(11): 1304-1315.

    Article  Google Scholar 

  10. Gao J, Jin R, Zhou J, Yu J X, Jiang X, Wang T. Relational approach for shortest path discovery over large graphs. Proceedings of the VLDB Endowment, 2011, 5(4): 358-369.

    Article  Google Scholar 

  11. Yan X, Yu P S, Han J. Substructure similarity search in graph databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2005, pp.766-777.

  12. Atkinson K E. An Introduction to Numerical Analysis. John Wiley & Sons, 2008.

  13. Deshpande A, Getoor L, Sen P. Graphical models for uncertain data. Managing and Mining Uncertain Data, Aggarwal C C (ed.), Springer, 1980: 77-112.

  14. Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques (1 edition). The MIT Press, 2009.

  15. Cheng Y, Yuan Y, Wang G, Qiao B, Wang Z. Efficient sampling methods for shortest path query over uncertain graphs. In Proc. the 19th DASFAA, Apr. 2014, pp.124-140.

  16. Jin R, Ruan N, Xiang Y, Lee V. A highway-centric labeling approach for answering distance queries on large sparse graphs. In Proc. ACM SIGMOD International Conference on Management of Data, May 2012, pp.445-456.

  17. Zou Z, Gao H, Li J. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2010, pp.633-642.

  18. Chandran L S, Ram L S. On the number of minimum cuts in a graph. SIAM Journal on Discrete Mathematics, 2004, 18(1): 177-194.

    Article  MATH  MathSciNet  Google Scholar 

  19. Shoikhet K, Geiger D. A practical algorithm for finding optimal triangulations. In Proc. the 14th National Conference on Artificial Intelligence and the 9th Innovative Applications of Artificial Intelligence Conference (AAAI), Jul. 1997, pp.185-190.

  20. Becker A, Geiger D. A sufficiently fast algorithm for finding close to optimal junction trees. In Proc. the 12th International Conference on Uncertainty in Artificial Intelligence, Aug. 1996, pp.81-89.

  21. Garg N, Vazirani V V, Yannakakis M. Multiway cuts in directed and node weighted graphs. In Proc. the 21st Int. Colloquium on Automata, Languages and Programming, Jul. 1994, pp.487-498.

  22. Chua H N, Sung W K, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006, 22(13): 1623-1630.

    Article  Google Scholar 

  23. Hua M, Pei J. Probabilistic path queries in road networks: Traffic uncertainty aware path selection. In Proc. the 13th International Conference on Extending Database Technology (EDBT), Mar. 2010, pp.347-358.

  24. Bast H, Funke S, Matijevic D. TRANSIT—Ultrafast shortestpath queries with linear-time preprocessing. In Proc. the 9th DIMACS Implementation Challenge: Shortest Paths, Nov. 2006.

  25. Delling D, Sanders P, Schultes D, Wagner D. Engineering route planning algorithms. In Algorithmics of Large and Complex Networks, Lerner J, Wagner D, Iweig K A (eds.), Springer, 2009, pp.117-139.

  26. Klein P N, Mozes S, Weimann O. Shortest paths in directed planar graphs with negative lengths: A linear-space O(n log2 n)-time algorithm. ACM Transactions on Algorithms (TALG), 2010, 6(2): Article No. 30.

  27. Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Comp., 2003, 32(5): 1338-1355.

    Article  MATH  MathSciNet  Google Scholar 

  28. Wei F. TEDI: Efficient shortest path query answering on graphs. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.99-110.

  29. Potamias M, Bonchi F, Castillo C, Gionis A. Fast shortest path distance estimation in large networks. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.867-876.

  30. Gubichev A, Bedathur S, Seufert S,Weikum G. Fast and accurate estimation of shortest paths in large graphs. In Proc. the 19th ACM Conference on Information and Knowledge Management, Nov. 2010, pp.499-508.

  31. Sankaranarayanan J, Samet H, Alborzi H. Path oracles for spatial networks. Proceedings of the VLDB Endowment, 2009, 2(1): 1210-1221.

    Article  Google Scholar 

  32. Samet H, Sankaranarayanan J, Alborzi H. Scalable network distance browsing in spatial databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.43-54.

  33. Sankaranarayanan J, Alborzi H, Samet H. Efficient query processing on spatial networks. In Proc. the 13th International Workshop on Geographic Information Systems (GIS), Nov. 2005, pp.200-209.

  34. Rice M, Tsotras V J. Graph indexing of road networks for shortest path queries with label restrictions. Proceedings of the VLDB Endowment, 2010, 4(2): 69-80.

    Article  Google Scholar 

  35. Jing N, Huang Y W, Rundensteiner E A. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 1998, 10(3): 409-432.

    Article  Google Scholar 

  36. Wu L, Xiao X, Deng D, Cong G, Zhu A D, Zhou S. Shortest path and distance queries on road networks: An experimental evaluation. Proceedings of the VLDB Endowment, 2012, 5(5): 406-417.

    Article  Google Scholar 

  37. Cheng J, Ke Y, Chu S, Cheng C. Efficient processing of distance queries in large graphs: A vertex cover approach. In Proc. ACM SIGMOD International Conference on Management of Data, May 2012, pp.457-468.

  38. Tong Y, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceedings of the VLDB Endowment, 2012, 5(11): 1650-1661.

    Article  Google Scholar 

  39. Tong Y, Chen L, Ding B. Discovering threshold-based frequent closed itemsets over probabilistic data. In Proc. the 28th International Conference on Data Engineering (ICDE), Apr. 2012, pp.270-281.

  40. Yuan Y, Wang G, Chen L, Wang H. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800-811.

    Article  Google Scholar 

  41. Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203-1218.

    Article  Google Scholar 

  42. Tong Y, Zhang X, Chen L. Tracking frequent items over distributed probabilistic data. World Wide Web, 2015.

  43. Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2009, pp.455-468.

  44. Deshpande A, Guestrin C, Hong W H, Madden S. Exploiting correlated attributes in acquisitional query processing. In Proc. the 21st International Conference on Data Engineering (ICDE), Apr. 2005, pp.143-154.

  45. Deshpande A, Garofalakis M, Rastogi R. Independence is good: Dependency-based histogram synopses for highdimensional data. ACM SIGMOD Record, 2001, 30(2): 199-210.

    Article  Google Scholar 

  46. Sen P, Deshpande A, Getoor L. PrDB: Managing and exploiting rich correlations in probabilistic databases. The VLDB Journal, 2009, 18(5): 1065-1090.

    Article  Google Scholar 

  47. Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases. In Proc. ACM SIGMOD Int. Conf. Management of Data, Jun. 2010, pp.675-686.

  48. Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions. Proceedings of the VLDB Endowment, 2011, 4(11): 852-863.

    Google Scholar 

  49. Loui R P. Optimal paths in graphs with stochastic or multidimensional weights. Communications of the ACM, 1983, 26(9): 670-676.

    Article  MATH  MathSciNet  Google Scholar 

  50. Jin R, Liu L, Ding B, Wang H. Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(9): 551-562.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Yuan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, YR., Yuan, Y., Chen, L. et al. Threshold-Based Shortest Path Query over Large Correlated Uncertain Graphs. J. Comput. Sci. Technol. 30, 762–780 (2015). https://doi.org/10.1007/s11390-015-1559-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1559-5

Keywords

Navigation