Abstract
Graphs play an important role in modern world, due to their widespread use for modeling, representing and organizing linked data. Taking into consideration that most of the “killer” applications require a graph-based representation (e.g., the Web, social network management, protein-protein interaction networks), efficient query processing and analysis techniques are required, not only because these graphs are massive but also because the operations that must be supported are complex, requiring significant computational resources. In many cases, each graph edge e is annotated by a probability value p(e), expressing its existential uncertainty. This means that with probability p(e) the edge will be present in the graph and with probability \(1-p(e)\) the edge will be absent. This gives rise to the concept of probabilistic graphs (also known as uncertain graphs). Formally, a probabilistic graph \(\mathcal{G}\) is a triplet (V, E, p) where V is the set of nodes, E is the set of edges and \(p: E \rightarrow (0,1]\). The main challenge posed by this formulation is that problems that are relatively easy to solve in exact graphs become very difficult (or even intractable) in probabilistic graphs. In this paper, we perform an overview of the algorithmic techniques proposed in the literature for uncertain graph analysis. In particular, we center our focus on the following graph mining tasks: clustering, maximal cliques, k-nearest neighbors and core decomposition. We conclude the paper with a short discussion related to distributed mining of uncertain graphs which is expected to achieve significant performance improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Although existential probabilities can be assigned to the vertices of the graph as well, in this paper we focus on edge probabilities only.
References
Aggarwal, C.C., Wang, H.: Managing and Mining Graph Data. Springer, Heidelberg (2010)
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 23 (2008)
Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 25–37. Springer, Heidelberg (2009). doi:10.1007/978-3-540-95995-3_3
Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4(1), 2 (2003)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Biswas, S., Morris, R.: Exor: opportunistic multi-hop routing for wireless networks. ACM SIGCOMM Comput. Commun. Rev. 35(4), 133–144 (2005)
Bonchi, F., Gullo, F., Kaltenbrunner, A., Volkovich, Y.: Core decomposition of uncertain graphs. In: KDD, pp. 1316–1325 (2014)
Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models and experimental evaluation. ACM J. Exp. Algorithmics 12(1.1), 1–26 (2007)
Cheng, Y., Yuan, Y., Chen, L., Wang, G., Giraud-Carrier, C., Sun, Y.: Distr: a distributed method for the reachability query over large uncertain graphs. IEEE Trans. Parallel Distrib. Syst. 27(11), 3172–3185 (2016)
Colbourn, C.J., Colbourn, C.: The Combinatorics of Network Reliability, vol. 200. Oxford University Press, New York (1987)
Cook, D.J., Holder, L.B.: Mining Graph Data. Wiley, Hoboken (2006)
Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17517-6_36
Feo, T.A., Resende, M.G.: A probabilistic heuristic for a computationally difficult set covering problem. Oper. Res. Lett. 8(2), 67–71 (1989)
Fogaras, D., Rácz, B.: Towards scaling fully personalized pagerank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30216-2_9
Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3), 75–174 (2010)
Friden, C., Hertz, A., de Werra, D.: Stabulus: a technique for finding stable sets in large graphs with tabu search. Computing 42(1), 35–44 (1989)
Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868), 141–147 (2002)
Ghosh, J., Ngo, H.Q., Yoon, S., Qiao, C.: On a routing problem within probabilistic graphs and its application to intermittently connected networks. In: 26th IEEE International Conference on Computer Communications, INFOCOM 2007, pp. 1721–1729. IEEE (2007)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)
Glover, F.: Tabu search–part II. ORSA J. Comput. 2(1), 4–32 (1990)
Goyal, A., Lu, W., Lakshmanan, L.V.: CELF++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 47–48. ACM (2011)
Harley, E., Bonner, A., Goodman, N.: Uniform integration of genome mapping data using intersection graphs. Bioinformatics 17(6), 487–494 (2001)
Huang, X., Cheng, H., Yu, J.X.: Attributed community analysis: global and ego-centric views. Data Eng. 14, 29 (2016)
Huang, X., Lu, W., Lakshmanan, L.V.: Truss decomposition of probabilistic graphs: semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)
Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: ICDM, pp. 459–468 (2012)
Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, Supercomputing 1996, Washington, DC, USA. IEEE Computer Society (1996)
Khan, A., Bonchi, F., Gionis, A., Gullo, F.: Fast reliability search in uncertain graphs. In: EDBT, pp. 535–546 (2014)
Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)
Kortsarz, G., Peleg, D.: Generating sparse 2-spanners. J. Algorithms 17(2), 222–236 (1994)
Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence. Citeseer (2005)
Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
LaSalle, D., Patwary, M.M.A., Satish, N., Sundaram, N., Dubey, P., Karypis, G.: Improving graph partitioning for modern graphs and architectures. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 14:1–14:4. ACM, New York (2015)
Liu, L., Jin, R., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 459–468. IEEE (2012)
Mcauley, J., Leskovec, J.: Discovering social circles in ego networks. ACM Trans. Knowl. Discov. Data (TKDD) 8(1), 4 (2014)
Mewes, H.-W., Amid, C., Arnold, R., Frishman, D., Güldener, U., Mannhaupt, G., Münsterkötter, M., Pagel, P., Strack, N., Stümpflen, V., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(suppl 1), D41–D44 (2004)
Mukherjee, A., Xu, P., Tirthapura, S.: Enumeration of maximal cliques from an uncertain graph. IEEE Trans. Knowl. Data Eng. 29, 543–555 (2016)
Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 243–254. IEEE (2015)
Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20:1–20:39 (2015)
Pathak, N., Mane, S., Srivastava, J.: Who thinks who knows who? Socio-cognitive analysis of email networks. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 466–477. IEEE (2006)
Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M.T., Pardalos, P.M. (eds.) Handbook of Optimization in Complex Networks. Springer Optimization and Its Applications, vol. 58, pp. 143–162. Springer, New York (2012)
Pfeiffer, J., Neville, J.: Methods to determine node centrality and clustering in graphs with uncertain structure. In: ICWSM (2011)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010)
Rokhlenko, O., Wexler, Y., Yakhini, Z.: Similarities and differences of gene expression in yeast stress conditions. Bioinformatics 23(2), e184–e190 (2007)
Rysz, M., Mirghorbani, M., Krokhmal, P., Pasiliao, E.L.: On risk-averse maximum weighted subgraph problems. J. Comb. Optim. 28(1), 167–185 (2014)
Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: Proceedings of the 25th International Conference on Machine Learning, pp. 896–903. ACM (2008)
Seidman, S.B.: Network structure and minimum degree. Soci. Netw. 5(3), 269–287 (1983)
Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H.: Link discovery in graphs derived from biological databases. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS, vol. 4075, pp. 35–49. Springer, Heidelberg (2006). doi:10.1007/11799511_5
Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Appl. Math. 144(1), 173–182 (2004)
Tangwongsan, K., Pavan, A., Tirthapura, S.: Parallel triangle counting in massive streaming graphs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, New York, NY, USA, pp. 781–786. ACM (2013)
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, New York, NY, USA, pp. 567–580. ACM (2008)
Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, New York, NY, USA, pp. 333–342. ACM (2014)
Tsourakakis, C.E.: A novel approach to finding near-cliques: the triangle-densest subgraph problem. CoRR abs/1405.1477 (2014)
Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)
Wu, Y., Yang, Y., Jiang, F., Jin, S., Xu, J.: Coritivity-based influence maximization in social networks. Phys. A Stat. Mech. Appl. 416, 467–480 (2014)
Yezerska, O., Butenko, S., Boginski, V.L.: Detecting robust cliques in graphs subject to uncertain edge failures. Ann. Oper. Res. 238, 1–24 (2016)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 800–811 (2012)
Zhang, B., Park, B.-H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)
Zou, Z.: Polynomial-time algorithm for finding densest subgraphs in uncertain graphs. In: Proceedings of MLG Workshop (2013)
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 649–652. IEEE (2010)
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Zou, Z., Zhu, R.: Truss decomposition of uncertain graphs. Knowl. Inf. Syst. 50, 197–230 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K. (2017). Mining Uncertain Graphs: An Overview. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-57045-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57044-0
Online ISBN: 978-3-319-57045-7
eBook Packages: Computer ScienceComputer Science (R0)