skip to main content
10.1145/2063576.2063787acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Finding information nebula over large networks

Published: 24 October 2011 Publication History

Abstract

Social and information networks have been extensively studied over years. In this paper, we concentrate ourselves on a large information network that is composed of entities and relationships, where entities are associated with sets of keyword terms (kterms) to specify what they are, and relationships describe the link structure among entities which can be very complex. Our work is motivated but is different from the existing works that find a best subgraph to describe how user-specified entities are connected. We compute information nebula (cloud) which is a set of top-K kterms P that are most correlated to a set of user-specified kterms Q, over a large information network. Our goal is to find how kterms are correlated given the complex information network among entities. The information nebula computing requests us to take all possible kterms into consideration for the top-K kterms selection, and needs to measure the similarity between kterms by considering all possible subgraphs that connect them instead of the best single one. In this work, we compute information nebula using a global structural-context similarity, and our similarity measure is independent of connection subgraphs. To the best of our knowledge, among the link-based similarity methods, none of the existing work considers similarity between two sets of nodes or two kterms. We propose new algorithms to find top-K kterms P for a given set of kterms Q based on the global structural-context similarity, without computing all the similarity scores of kterms in the large information network. We performed extensive performance studies using large real datasets, and confirmed the effectiveness and efficiency of our approach.

References

[1]
I. Antonellis, H. Garcia-Molina, and C.-C. Chang. Simrank++: query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008.
[2]
B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1), 2008.
[3]
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In Proc. of ICDE'07, 2007.
[4]
R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999.
[5]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4), 2003.
[6]
C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. of KDD'04, 2004.
[7]
D. Fogaras and B. Rácz. Scaling link-based similarity search. In Proc. of WWW'05, 2005.
[8]
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In Proc. of SIGMOD'08, 2008.
[9]
M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In Proc. of KDD'10, 2010.
[10]
J. Han, Y. Sun, X. Yan, and P. S. Yu. Mining knowledge from databases: an information network analysis approach (tutorial). In Proc. of SIGMOD'10, 2010.
[11]
T. H. Haveliwala. Topic-sensitive pagerank. In Proc. of WWW'02, 2002.
[12]
V. Hristidis, H. Hwang, and Y. Papakonstantinou. Authority-based keyword search in databases. ACM Trans. Database Syst., 33(1), 2008.
[13]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In Proc.of KDD'02, 2002.
[14]
G. Kasneci, S. Elbassuoni, and G. Weikum. MING: mining informative entity relationship subgraphs. In Proc. of CIKM'09, 2009.
[15]
G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. STAR: Steiner-tree approximation in relationship graphs. In Proc. of ICDE'09, 2009.
[16]
A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD'10, 2010.
[17]
M. Ley. DBLP - some lessons learned. PVLDB, 2(2), 2009.
[18]
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In Proc. of EDBT'10, 2010.
[19]
P. Li, H. Liu, J. X. Yu, J. He, and X. Du. Fast single-pair simrank computation. In Proc. of SDM'10, 2010.
[20]
D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In Proc. of CIKM'03, 2003.
[21]
D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. PVLDB, 1(1), 2008.
[22]
L. Qin, J. X. Yu, L. Chang, and Y. Tao. Query communities in relational databases. In Proc. of ICDE'09, 2009.
[23]
P. Sarkar and A. W. Moore. Fast nearest-neighbor search in disk-resident graphs. In Proc. of KDD'10, 2010.
[24]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proc. of WWW'07, 2007.
[25]
H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proc. of KDD'06, 2006.
[26]
P. Zhao, J. Han, and Y. Sun. P-Rank: a comprehensive structural similarity measure over information networks. In Proc. of CIKM'09, 2009.

Cited By

View all
  • (2015)Querying Knowledge Graphs by Example Entity TuplesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242669627:10(2797-2811)Online publication date: 1-Oct-2015

Index Terms

  1. Finding information nebula over large networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
      October 2011
      2712 pages
      ISBN:9781450307178
      DOI:10.1145/2063576
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 October 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attribute graph
      2. fingerprint index
      3. information nebula
      4. keyword query
      5. random walk
      6. similarity measure

      Qualifiers

      • Research-article

      Conference

      CIKM '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Querying Knowledge Graphs by Example Entity TuplesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242669627:10(2797-2811)Online publication date: 1-Oct-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media