skip to main content
10.1145/1989323.1989421acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Assessing and ranking structural correlations in graphs

Published: 12 June 2011 Publication History

Abstract

Real-life graphs not only have nodes and edges, but also have events taking place, e.g., product sales in social networks and virus infection in communication networks. Among different events, some exhibit strong correlation with the network structure, while others do not. Such structural correlation will shed light on viral influence existing in the corresponding network. Unfortunately, the traditional association mining concept is not applicable in graphs since it only works on homogeneous datasets like transactions and baskets.
We propose a novel measure for assessing such structural correlations in heterogeneous graph datasets with events. The measure applies hitting time to aggregate the proximity among nodes that have the same event. In order to calculate the correlation scores for many events in a large network, we develop a scalable framework, called gScore, using sampling and approximation. By comparing to the situation where events are randomly distributed in the same network, our method is able to discover events that are highly correlated with the graph structure. gScore is scalable and was successfully applied to the co-author DBLP network and social networks extracted from TaoBao.com, the largest online shopping network in China, with many interesting discoveries.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, 1993.
[2]
A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. In SIGKDD, pages 7--15, 2008.
[3]
H. Bao and E. Y. Chang. Adheat: an influence-based diffusion model for propagating hints to match ads. In WWW, pages 71--80, 2010.
[4]
M. Brand. A random walks perspective on maximizing satisfaction and profit. In SDM, 2005.
[5]
J. J. Brown and P. H. Reingen. Social ties and word-of-mouth referral behavior. J. of Consumer Research, 14(3):350--362, 1987.
[6]
P. Chebotarev and E. V. Shamis. On proximity measures for graph vertices. Automation and remote control, 59(10):1443--1459, 1998.
[7]
W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In SIGKDD, 2010.
[8]
M. Ester, R. Ge, B. J. Gao, Z. Hu, and B. Ben-Moshe. Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In SDM, pages 25--46, 2006.
[9]
G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In SIGKDD, pages 150--160. ACM, 2000.
[10]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721--732, 2005.
[11]
A. Gionis, H. Mannila, T. Mielikäinen, and P. Tsaparas. Assessing data mining results via swap randomization. In SIGKDD, pages 167--176, 2006.
[12]
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. of the American Statistical Association, 58(301):13--30, 1963.
[13]
J. Hopcroft, O. Khan, B. Kulis, and B. Selman. Natural communities in large linked networks. In SIGKDD, pages 541--546, 2003.
[14]
G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279. ACM, 2003.
[15]
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.
[16]
H. Kautz, B. Selman, and M. Shah. Referral web: combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997.
[17]
D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In SIGKDD, pages 137--146. ACM, 2003.
[18]
A. Kirsch, M. Mitzenmacher, A. Pietracaprina, G. Pucci, E. Upfal,and F. Vandin. An efficient rigorous approach for identifying statistically significant frequent itemsets. In PODS, 2009.
[19]
T. La Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In WWW, pages 601--610, 2010.
[20]
L. Lovász. Random walks on graphs: A survey. Bolyai Soc. Math. Studies, 2(2):1--46, 1993.
[21]
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM, 2008.
[22]
F. Moser, R. Colak, A. Rafiey, and M. Ester. Mining cohesive patterns from graphs with feature vectors. In SDM, 2009.
[23]
D. L. Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd. The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[25]
J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In SIGKDD, pages 228--238, 2005.
[26]
P. Sarkar and A. Moore. A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In UAI, 2007.
[27]
P. Sarkar, A. W. Moore, and A. Prakash. Fast incremental proximity search in large graphs. In ICML, pages 896--903, 2008.
[28]
S. E. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007.
[29]
A. Silva, W. Meira Jr, and M. J. Zaki. Structural correlation pattern mining for large graphs. In Proc. of the 8th Workshop on Mining and Learning with Graphs, pages 119--126, 2010.
[30]
R. Srikant and R. Agrawal. Mining generalized association rules. Future Generation Computer Systems, 13(2--3):161--180, 1995.
[31]
U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007.
[32]
Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In SIGKDD, pages 797--802, 2006.
[33]
Y. Zhou, H. Cheng, and J. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718--729, 2009.

Cited By

View all
  • (2024)Graph Evolution-Based Vertex Extraction for Hyperspectral Anomaly DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.330327335:12(17372-17386)Online publication date: Dec-2024
  • (2024)Hyperspectral Anomaly Detection Using Reconstruction Fusion of Quaternion Frequency Domain AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322716735:6(8358-8372)Online publication date: Jun-2024
  • (2024)Efficient Algorithms for Group Hitting Probability Queries on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349164(1-13)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Assessing and ranking structural correlations in graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
    June 2011
    1364 pages
    ISBN:9781450306614
    DOI:10.1145/1989323
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph
    2. hitting time
    3. structural correlation

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Graph Evolution-Based Vertex Extraction for Hyperspectral Anomaly DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.330327335:12(17372-17386)Online publication date: Dec-2024
    • (2024)Hyperspectral Anomaly Detection Using Reconstruction Fusion of Quaternion Frequency Domain AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322716735:6(8358-8372)Online publication date: Jun-2024
    • (2024)Efficient Algorithms for Group Hitting Probability Queries on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349164(1-13)Online publication date: 2024
    • (2024)Fast Iterative Graph Computing with Updated Neighbor States2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00193(2449-2462)Online publication date: 13-May-2024
    • (2024)Ingress: an automated incremental graph processing systemThe VLDB Journal10.1007/s00778-024-00838-z33:3(781-806)Online publication date: 20-Feb-2024
    • (2023)Layph: Making Change Propagation Constraint in Incremental Graph Processing by Layering Graph2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00212(2766-2779)Online publication date: Apr-2023
    • (2022)Generalized Euclidean Measure to Estimate Distances on Multilayer NetworksACM Transactions on Knowledge Discovery from Data10.1145/352939616:6(1-22)Online publication date: 30-Jul-2022
    • (2022)Personalized Graph Summarization: Formulation, Scalable Algorithms, and Applications2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00219(2319-2332)Online publication date: May-2022
    • (2022)The multi-walker chain and its application in local community detectionKnowledge and Information Systems10.1007/s10115-018-1247-160:3(1663-1691)Online publication date: 10-Mar-2022
    • (2021)Automating incremental graph processing with flexible memoizationProceedings of the VLDB Endowment10.14778/3461535.346155014:9(1613-1625)Online publication date: 22-Oct-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media