skip to main content
10.1145/2479787.2479809acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Extract and rank web communities

Published: 12 June 2013 Publication History

Abstract

A web community is a pattern in the WWW which is understood as a set of related web pages. In this paper, we propose an efficient algorithm to find the web communities on a given specific topic. Instead of working on the whole web graph, we work on a web domain, which we extract based on the topic specific search results. Therefore, the resulted communities are highly related with the search topic.
The ranking of a community denotes the degree of relevance between the search query and the extracted communities. We introduce an approach for ranking the extracted communities based on their dense bipartite pattern. Ranking significantly improves the relevance of the extracted communities with the search topic.

References

[1]
https://gist.github.com/44acb9783696bc8e33e4/.
[2]
G. Attardi, A. Gulli, and F. Sebastiani. Automatic web page categorization by link and context analysis, 1999.
[3]
R. Baeza-Yates. Web page ranking using link attributes. In Proceeding of the 13th International Conference on World Wide Web (WWW'04), pages 328--329, New York, NY, USA, May 2004.
[4]
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21th SIGIR, pages 104--111, 1998.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, pages 107--117. Elsevier, 1998.
[6]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, pages 107--117. Elsevier, 1998.
[7]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer Networks, 33(1--6):309--320, 2000.
[8]
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the international conference on Web search and web data mining (WSDM '08), pages 95--106. ACM, 2008.
[9]
J. Dean and M. R. Henzinger. Finding related pages in the world wide web. Computer Networks, 31(11--16):1467--1479, 1999.
[10]
Y. Dourisboure, F. Geraci, and M. Pellegrini. Extraction and classification of dense communities in the web. In Proceedings of the 16th international conference on World Wide Web(WWW '07), pages 461--470. ACM, 2007.
[11]
G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150--160, Boston, MA, USA, August 2000.
[12]
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66--71, 2002.
[13]
D. Gibson, J. M. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space - Structure in Hypermedia Systems, pages 225--234, Pittsburgh, PA, USA, June 1998.
[14]
E. J. Glover, K. Tsioutsiouliklis, S. Lawrence, D. M. Pennock, and G. W. Flake. Using web structure for classifying and describing web pages. In Proceedings of the 11th International Conference on World Wide Web (WWW'02), pages 562--569, Honolulu, Hawaii, USA, May 2002.
[15]
J. Han, X. Hu, and N. Cercone. On graph-based methods for inferring web communities. In Proceedings of the Workshop on Applications, Products and Services of Web-based Support Systems (WSS'03), pages 145--152, Halifax, Canada, 2003.
[16]
D. S. Hochbaum. Approximating clique and biclique problems. J. Algorithms, 29:174--200, 1998.
[17]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[18]
R. Kosala. Web mining research: A survey. SIGKDD Explorations, 2:1--15, 2007.
[19]
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 1--10, Dallas, Texas, USA, May 2000.
[20]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999.
[21]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
[22]
P. K. Reddy and M. Kitsuregawa. An approach to relate the web communities through bipartite graphs. In Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE'01), pages 301--310, Kyoto, Japan, December 2001.
[23]
S. Sclaroff. World wide web image search engines. Technical report, Proceedings of NSF Workshop on Visual Information Management, 1995.
[24]
J. Srivastava and R. Cooley. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1:12--23, 2000.
[25]
K. Verbeurgt. Inferring emergent web communities. In Proceedings of the International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, e-Medicine, and Mobile Technologies on the Internet, 2003.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
June 2013
408 pages
ISBN:9781450318501
DOI:10.1145/2479787
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • UAM: Autonomous University of Madrid

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dense bipartite graph
  2. domain graph
  3. ranking web communities
  4. structured web search
  5. web community
  6. web graph

Qualifiers

  • Research-article

Conference

WIMS '13
Sponsor:
  • UAM

Acceptance Rates

WIMS '13 Paper Acceptance Rate 28 of 72 submissions, 39%;
Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 93
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media