Mining Uncertain Graphs: An Overview

Kassiano, Vasileios; Gounaris, Anastasios; Papadopoulos, Apostolos N.; Tsichlas, Kostas

doi:10.1007/978-3-319-57045-7_6

Vasileios Kassiano¹⁵,
Anastasios Gounaris¹⁵,
Apostolos N. Papadopoulos¹⁵ &
…
Kostas Tsichlas¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10230))

Included in the following conference series:

International Workshop of Algorithmic Aspects of Cloud Computing

1994 Accesses

Abstract

Graphs play an important role in modern world, due to their widespread use for modeling, representing and organizing linked data. Taking into consideration that most of the “killer” applications require a graph-based representation (e.g., the Web, social network management, protein-protein interaction networks), efficient query processing and analysis techniques are required, not only because these graphs are massive but also because the operations that must be supported are complex, requiring significant computational resources. In many cases, each graph edge e is annotated by a probability value p(e), expressing its existential uncertainty. This means that with probability p(e) the edge will be present in the graph and with probability $1-p(e)$ the edge will be absent. This gives rise to the concept of probabilistic graphs (also known as uncertain graphs). Formally, a probabilistic graph $\mathcal{G}$ is a triplet (V, E, p) where V is the set of nodes, E is the set of edges and $p: E \rightarrow (0,1]$. The main challenge posed by this formulation is that problems that are relatively easy to solve in exact graphs become very difficult (or even intractable) in probabilistic graphs. In this paper, we perform an overview of the algorithmic techniques proposed in the literature for uncertain graph analysis. In particular, we center our focus on the following graph mining tasks: clustering, maximal cliques, k-nearest neighbors and core decomposition. We conclude the paper with a short discussion related to distributed mining of uncertain graphs which is expected to achieve significant performance improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A survey on mining and analysis of uncertain graphs

Article 28 June 2022

Subgraph similarity maximal all-matching over a large uncertain graph

Article 05 July 2015

Big Graph Analyses: From Queries to Dependencies and Association Rules

Article Open access 07 January 2017

Notes

1.
Although existential probabilities can be assigned to the vertices of the graph as well, in this paper we focus on edge probabilities only.

References

Aggarwal, C.C., Wang, H.: Managing and Mining Graph Data. Springer, Heidelberg (2010)
Book MATH Google Scholar
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 23 (2008)
Article MathSciNet MATH Google Scholar
Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 25–37. Springer, Heidelberg (2009). doi:10.1007/978-3-540-95995-3_3
Chapter Google Scholar
Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4(1), 2 (2003)
Article Google Scholar
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Article MathSciNet MATH Google Scholar
Biswas, S., Morris, R.: Exor: opportunistic multi-hop routing for wireless networks. ACM SIGCOMM Comput. Commun. Rev. 35(4), 133–144 (2005)
Article Google Scholar
Bonchi, F., Gullo, F., Kaltenbrunner, A., Volkovich, Y.: Core decomposition of uncertain graphs. In: KDD, pp. 1316–1325 (2014)
Google Scholar
Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models and experimental evaluation. ACM J. Exp. Algorithmics 12(1.1), 1–26 (2007)
MathSciNet MATH Google Scholar
Cheng, Y., Yuan, Y., Chen, L., Wang, G., Giraud-Carrier, C., Sun, Y.: Distr: a distributed method for the reachability query over large uncertain graphs. IEEE Trans. Parallel Distrib. Syst. 27(11), 3172–3185 (2016)
Article Google Scholar
Colbourn, C.J., Colbourn, C.: The Combinatorics of Network Reliability, vol. 200. Oxford University Press, New York (1987)
MATH Google Scholar
Cook, D.J., Holder, L.B.: Mining Graph Data. Wiley, Hoboken (2006)
Book MATH Google Scholar
Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17517-6_36
Chapter Google Scholar
Feo, T.A., Resende, M.G.: A probabilistic heuristic for a computationally difficult set covering problem. Oper. Res. Lett. 8(2), 67–71 (1989)
Article MathSciNet MATH Google Scholar
Fogaras, D., Rácz, B.: Towards scaling fully personalized pagerank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30216-2_9
Chapter Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3), 75–174 (2010)
Article MathSciNet Google Scholar
Friden, C., Hertz, A., de Werra, D.: Stabulus: a technique for finding stable sets in large graphs with tabu search. Computing 42(1), 35–44 (1989)
Article MATH Google Scholar
Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868), 141–147 (2002)
Article Google Scholar
Ghosh, J., Ngo, H.Q., Yoon, S., Qiao, C.: On a routing problem within probabilistic graphs and its application to intermittently connected networks. In: 26th IEEE International Conference on Computer Communications, INFOCOM 2007, pp. 1721–1729. IEEE (2007)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)
Article Google Scholar
Glover, F.: Tabu search–part II. ORSA J. Comput. 2(1), 4–32 (1990)
Article MATH Google Scholar
Goyal, A., Lu, W., Lakshmanan, L.V.: CELF++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 47–48. ACM (2011)
Google Scholar
Harley, E., Bonner, A., Goodman, N.: Uniform integration of genome mapping data using intersection graphs. Bioinformatics 17(6), 487–494 (2001)
Article Google Scholar
Huang, X., Cheng, H., Yu, J.X.: Attributed community analysis: global and ego-centric views. Data Eng. 14, 29 (2016)
Google Scholar
Huang, X., Lu, W., Lakshmanan, L.V.: Truss decomposition of probabilistic graphs: semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)
Google Scholar
Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: ICDM, pp. 459–468 (2012)
Google Scholar
Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, Supercomputing 1996, Washington, DC, USA. IEEE Computer Society (1996)
Google Scholar
Khan, A., Bonchi, F., Gionis, A., Gullo, F.: Fast reliability search in uncertain graphs. In: EDBT, pp. 535–546 (2014)
Google Scholar
Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)
Article Google Scholar
Kortsarz, G., Peleg, D.: Generating sparse 2-spanners. J. Algorithms 17(2), 222–236 (1994)
Article MathSciNet MATH Google Scholar
Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence. Citeseer (2005)
Google Scholar
Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
Article Google Scholar
LaSalle, D., Patwary, M.M.A., Satish, N., Sundaram, N., Dubey, P., Karypis, G.: Improving graph partitioning for modern graphs and architectures. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 14:1–14:4. ACM, New York (2015)
Google Scholar
Liu, L., Jin, R., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 459–468. IEEE (2012)
Google Scholar
Mcauley, J., Leskovec, J.: Discovering social circles in ego networks. ACM Trans. Knowl. Discov. Data (TKDD) 8(1), 4 (2014)
Google Scholar
Mewes, H.-W., Amid, C., Arnold, R., Frishman, D., Güldener, U., Mannhaupt, G., Münsterkötter, M., Pagel, P., Strack, N., Stümpflen, V., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(suppl 1), D41–D44 (2004)
Article Google Scholar
Mukherjee, A., Xu, P., Tirthapura, S.: Enumeration of maximal cliques from an uncertain graph. IEEE Trans. Knowl. Data Eng. 29, 543–555 (2016)
Article Google Scholar
Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 243–254. IEEE (2015)
Google Scholar
Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Article Google Scholar
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Article Google Scholar
Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)
Google Scholar
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)
Google Scholar
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20:1–20:39 (2015)
Article MathSciNet Google Scholar
Pathak, N., Mane, S., Srivastava, J.: Who thinks who knows who? Socio-cognitive analysis of email networks. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 466–477. IEEE (2006)
Google Scholar
Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M.T., Pardalos, P.M. (eds.) Handbook of Optimization in Complex Networks. Springer Optimization and Its Applications, vol. 58, pp. 143–162. Springer, New York (2012)
Chapter Google Scholar
Pfeiffer, J., Neville, J.: Methods to determine node centrality and clustering in graphs with uncertain structure. In: ICWSM (2011)
Google Scholar
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010)
Article Google Scholar
Rokhlenko, O., Wexler, Y., Yakhini, Z.: Similarities and differences of gene expression in yeast stress conditions. Bioinformatics 23(2), e184–e190 (2007)
Article Google Scholar
Rysz, M., Mirghorbani, M., Krokhmal, P., Pasiliao, E.L.: On risk-averse maximum weighted subgraph problems. J. Comb. Optim. 28(1), 167–185 (2014)
Article MathSciNet MATH Google Scholar
Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: Proceedings of the 25th International Conference on Machine Learning, pp. 896–903. ACM (2008)
Google Scholar
Seidman, S.B.: Network structure and minimum degree. Soci. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H.: Link discovery in graphs derived from biological databases. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS, vol. 4075, pp. 35–49. Springer, Heidelberg (2006). doi:10.1007/11799511_5
Chapter Google Scholar
Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Appl. Math. 144(1), 173–182 (2004)
Article MathSciNet MATH Google Scholar
Tangwongsan, K., Pavan, A., Tirthapura, S.: Parallel triangle counting in massive streaming graphs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, New York, NY, USA, pp. 781–786. ACM (2013)
Google Scholar
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, New York, NY, USA, pp. 567–580. ACM (2008)
Google Scholar
Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, New York, NY, USA, pp. 333–342. ACM (2014)
Google Scholar
Tsourakakis, C.E.: A novel approach to finding near-cliques: the triangle-densest subgraph problem. CoRR abs/1405.1477 (2014)
Google Scholar
Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)
Article MathSciNet MATH Google Scholar
Wu, Y., Yang, Y., Jiang, F., Jin, S., Xu, J.: Coritivity-based influence maximization in social networks. Phys. A Stat. Mech. Appl. 416, 467–480 (2014)
Article Google Scholar
Yezerska, O., Butenko, S., Boginski, V.L.: Detecting robust cliques in graphs subject to uncertain edge failures. Ann. Oper. Res. 238, 1–24 (2016)
Article MathSciNet Google Scholar
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 800–811 (2012)
Article Google Scholar
Zhang, B., Park, B.-H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)
Article Google Scholar
Zou, Z.: Polynomial-time algorithm for finding densest subgraphs in uncertain graphs. In: Proceedings of MLG Workshop (2013)
Google Scholar
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 649–652. IEEE (2010)
Google Scholar
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Article Google Scholar
Zou, Z., Zhu, R.: Truss decomposition of uncertain graphs. Knowl. Inf. Syst. 50, 197–230 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Vasileios Kassiano, Anastasios Gounaris, Apostolos N. Papadopoulos & Kostas Tsichlas

Authors

Vasileios Kassiano
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Gounaris
View author publications
You can also search for this author in PubMed Google Scholar
Apostolos N. Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Tsichlas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Apostolos N. Papadopoulos .

Editor information

Editors and Affiliations

Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Victoria, Australia
Timos Sellis
Informatics, Ionian University, Kerkyra, Greece
Konstantinos Oikonomou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K. (2017). Mining Uncertain Graphs: An Overview. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-57045-7_6
Published: 11 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57044-0
Online ISBN: 978-3-319-57045-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics