research-article

Finding information nebula over large networks

Authors:

Haixun WangAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 1465 - 1474

https://doi.org/10.1145/2063576.2063787

Published: 24 October 2011 Publication History

Abstract

Social and information networks have been extensively studied over years. In this paper, we concentrate ourselves on a large information network that is composed of entities and relationships, where entities are associated with sets of keyword terms (kterms) to specify what they are, and relationships describe the link structure among entities which can be very complex. Our work is motivated but is different from the existing works that find a best subgraph to describe how user-specified entities are connected. We compute information nebula (cloud) which is a set of top-K kterms P that are most correlated to a set of user-specified kterms Q, over a large information network. Our goal is to find how kterms are correlated given the complex information network among entities. The information nebula computing requests us to take all possible kterms into consideration for the top-K kterms selection, and needs to measure the similarity between kterms by considering all possible subgraphs that connect them instead of the best single one. In this work, we compute information nebula using a global structural-context similarity, and our similarity measure is independent of connection subgraphs. To the best of our knowledge, among the link-based similarity methods, none of the existing work considers similarity between two sets of nodes or two kterms. We propose new algorithms to find top-K kterms P for a given set of kterms Q based on the global structural-context similarity, without computing all the similarity scores of kterms in the large information network. We performed extensive performance studies using large real datasets, and confirmed the effectiveness and efficiency of our approach.

References

[1]

I. Antonellis, H. Garcia-Molina, and C.-C. Chang. Simrank++: query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008.

Digital Library

[2]

B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1), 2008.

Digital Library

[3]

B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In Proc. of ICDE'07, 2007.

[4]

R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999.

Digital Library

[5]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4), 2003.

Digital Library

[6]

C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. of KDD'04, 2004.

Digital Library

[7]

D. Fogaras and B. Rácz. Scaling link-based similarity search. In Proc. of WWW'05, 2005.

Digital Library

[8]

K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In Proc. of SIGMOD'08, 2008.

Digital Library

[9]

M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In Proc. of KDD'10, 2010.

Digital Library

[10]

J. Han, Y. Sun, X. Yan, and P. S. Yu. Mining knowledge from databases: an information network analysis approach (tutorial). In Proc. of SIGMOD'10, 2010.

Digital Library

[11]

T. H. Haveliwala. Topic-sensitive pagerank. In Proc. of WWW'02, 2002.

Digital Library

[12]

V. Hristidis, H. Hwang, and Y. Papakonstantinou. Authority-based keyword search in databases. ACM Trans. Database Syst., 33(1), 2008.

Digital Library

[13]

G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In Proc.of KDD'02, 2002.

Digital Library

[14]

G. Kasneci, S. Elbassuoni, and G. Weikum. MING: mining informative entity relationship subgraphs. In Proc. of CIKM'09, 2009.

Digital Library

[15]

G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. STAR: Steiner-tree approximation in relationship graphs. In Proc. of ICDE'09, 2009.

Digital Library

[16]

A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD'10, 2010.

Digital Library

[17]

M. Ley. DBLP - some lessons learned. PVLDB, 2(2), 2009.

Digital Library

[18]

C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In Proc. of EDBT'10, 2010.

Digital Library

[19]

P. Li, H. Liu, J. X. Yu, J. He, and X. Du. Fast single-pair simrank computation. In Proc. of SDM'10, 2010.

[20]

D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In Proc. of CIKM'03, 2003.

Digital Library

[21]

D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. PVLDB, 1(1), 2008.

Digital Library

[22]

L. Qin, J. X. Yu, L. Chang, and Y. Tao. Query communities in relational databases. In Proc. of ICDE'09, 2009.

Digital Library

[23]

P. Sarkar and A. W. Moore. Fast nearest-neighbor search in disk-resident graphs. In Proc. of KDD'10, 2010.

Digital Library

[24]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proc. of WWW'07, 2007.

Digital Library

[25]

H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proc. of KDD'06, 2006.

Digital Library

[26]

P. Zhao, J. Han, and Y. Sun. P-Rank: a comprehensive structural similarity measure over information networks. In Proc. of CIKM'09, 2009.

Digital Library

Cited By

Jayaram NKhan AChengkai Li Xifeng Yan Elmasri R(2015)Querying Knowledge Graphs by Example Entity TuplesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242669627:10(2797-2811)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2426696

Index Terms

Finding information nebula over large networks
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
A new similarity measure between intuitionistic fuzzy sets and the positive definiteness of the similarity matrix

As a generation of fuzzy set theory, intuitionistic fuzzy (IF) set theory has received considerable attention for its capability on dealing with uncertainty. Similarity measures of IF sets are used to indicate the degree of commonality between IF sets. ...
Measuring Similarity Based on Link Information: A Comparative Study

Measuring similarity between objects is a fundamental task in domains such as data mining, information retrieval, and so on. Link-based similarity measures have attracted the attention of many researchers and have been widely applied in recent years. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
331
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jayaram NKhan AChengkai Li Xifeng Yan Elmasri R(2015)Querying Knowledge Graphs by Example Entity TuplesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242669627:10(2797-2811)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2426696

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents