Web Structure Mining by Isolated Cliques

Yushi UNO
Yoshinobu OTA
Akio UEMICHI

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E90-D    No.12    pp.1998-2006
Publication Date: 2007/12/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.12.1998
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Data Mining
Keyword: 
link analysis,  link farm,  isolated clique,  webgraph,  web community,  web structure mining,  

Full Text: PDF(720.7KB)>>
Buy this Article



Summary: 
The link structure of the Web is generally viewed as the webgraph. Web structure mining is a research area that mainly aims to find hidden communities by focusing on the webgraph, and communities or their cores are supposed to constitute dense subgraphs. Therefore, structure mining can actually be realized by enumerating such substructures, and Kleinberg's biclique model is well-known among them. In this paper, we examine some candidate substructures, including conventional bicliques, and attempt to find useful information from the real web data. Especially, we newly exploit isolated cliques for our experiments of structure mining. As a result, we discovered that isolated cliques that lie over multiple domains can stand for useful communities, which implies the validity of isolated clique as a candidate substructure for structure mining. On the other hand, we also observed that most of isolated cliques on the Web correspond to menu structures and are inherent in single domains, and that isolated cliques can be quite useful for detecting harmful link farms.


open access publishing via