Article

Mirror site maintenance based on evolution associations of web directories

Authors:
Ling Chen

Hannover, Germany

Hannover, Germany
View Profile

,
Sourav Bhowmick

Singapore, Singapore

Singapore, Singapore
View Profile

,
Wolfgang Nejdl

Hannover, Germany

Hannover, Germany
View Profile

WWW '07: Proceedings of the 16th international conference on World Wide WebMay 2007Pages 1297–1298https://doi.org/10.1145/1242572.1242817

Published:08 May 2007Publication History

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 1297–1298

ABSTRACT

Mirroring Web sites is a well-known technique commonly used in the Web community. A mirror site should be updated frequently to ensure that it reflects the content of the original site. Existing mirroring tools apply page-level strategies to check each page of a site, which is inefficient and expensive. In this paper, we propose a novel site-level mirror maintenance strategy. Our approach studies the evolution of Web directorystructures and mines association rules between ancestor-descendant Web directories. Discovered rules indicate the evolution correlations between Web directories. Thus, when maintaining the mirror of a Web site (directory), we can optimally skipsubdirectories which are negatively correlated with it in undergoing significant changes. The preliminary experimental results show that our approach improves the efficiency of the mirror maintenance process significantly while sacrificing slightly in keeping the "freshness" of the mirrors.

References

Rsync Rsync. In http://samba.anu.edu.au/rsync/.Google Scholar
Web mirroring project. In http://www.l3s.de/lchen/mirror.pdf.Google Scholar
K. Bharat and A. Z. Broder. Mirror, mirror on the web: A study of host pairs with replicated content.Computer Networks, 1999. Google ScholarDigital Library
A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective. In WWW, 2004. Google ScholarDigital Library
Y. Wang, D. J. DeWitt, and J.-Y. Cai. X-diff: An effective change detection algorithm for xml documents. In ICDE, 2003.Google ScholarCross Ref

Index Terms

Mirror site maintenance based on evolution associations of web directories
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Web evolution and Web Science

This paper examines the evolution of the World Wide Web as a network of networks and discusses the emergence of Web Science as an interdisciplinary area that can provide us with insights on how the Web developed, and how it has affected and is affected ...
Read More
Estimating the size and evolution of categorised topics in web directories

In this paper a statistical approach for estimating the evolution of categorized web page populations in web directories is proposed. The proposal is based on the capture-recapture method used in wildlife biological studies and it is modified according ...
Read More
On the evolution of clusters of near-duplicate web pages

This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basisover the span of 11 weeks. We then determined which of these pages are near-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolution correlation
mirror maintenance
web evolution
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 191
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mirror site maintenance based on evolution associations of web directories

WWW '07: Proceedings of the 16th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Web evolution and Web Science

Estimating the size and evolution of categorised topics in web directories

On the evolution of clusters of near-duplicate web pages