Using Random Walks for Mining Web Document Associations

Selçuk Candan, K.; Li, Wen-Syan

doi:10.1007/3-540-45571-X_35

K. Selçuk Candan⁴ &
Wen-Syan Li⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1714 Accesses
3 Citations

Abstract

World Wide Web has emerged as a primary means for storing and structuring information. In this paper, we present a framework for mining implicit associations among Web documents. We focus on the following problem: “For a given set of seed URLs, find a list of Web pages which reflect the association among these seeds.” In the proposed framework, associations of two documents are induced by the connectivity and linking path length. Based on this framework, we have developed a random walk-based Web mining technique and validated it by experiments on real Web data. In this paper, we also discuss the extension of the algorithm for considering document contents.

This work was performed when the author visited NEC, CCRL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jeffrey Dean and Monika Henzinger. Finding Related Pages in the World Wide Web. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Google Scholar
Netscape Communications Corporation. What’s Related web page. Information available at http://home.netscape.com/netscapes/related/’faq.html .
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 668–677, January 1998.
Google Scholar
Wen-Syan Li and Selcuk Candan. Integrating Content Search with Structure Analysis for Hypermedia Retrieval and Management. To appear in ACM Computing Survey, 2000.
Google Scholar
Frank K. Hwang, Dana S. Richards, and Pawel Winter, editors. The Steiner Tree Problem (Annals of Discrete Mathematics, Vol 53). 1992.
Google Scholar
S.L. Hakimi. Steiner’s problem in graphs and its implications. Networks, 1:113–131, 1971.
Article MATH MathSciNet Google Scholar
Krishna Bharat and Monika Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21th Annual International ACM SIGIR Conference, pages 104–111, Melbourne, Australia, August 1998.
Google Scholar
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In Proceedings of the 7th World-Wide Web Conference, pages 65–74, Brisbane, Queensland, Australia, April 1998.
Google Scholar
Lawrence Page and Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th World-Wide Web Conference, Brisbane, Queensland, Australia, April 1998.
Google Scholar
David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan. Inferring Web Communities from Link Topology. In Proceedings of the 1998 ACM Hypertext Conference, pages 225–234, Pittsburgh, PA, USA, June 1998.
Google Scholar
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the Web for Emerging Cyber-Communities. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Google Scholar
Krishna Bharat and Andrei Z. Broder. Mirror, Mirror, on the Web: A Study of Host Pairs with Replicated Content. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

C&C Research Laboratories, NEC USA, Inc., MS/SJ10, San Jose, CA, 95134, USA
K. Selçuk Candan
Computer Sci, and Eng. Dept., Arizona State University, Tempe, AZ, 85287, USA
Wen-Syan Li

Authors

K. Selçuk Candan
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Syan Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Systems Management, Universiy of Tsukuba, 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Takao Terano
Department of Computer Science and Engineering, Arizona State University, P.O. Box 875 406, Tempe, AZ, 85287-5406
Huan Liu
Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan ROC
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Selçuk Candan, K., Li, WS. (2000). Using Random Walks for Mining Web Document Associations. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_35

Download citation

DOI: https://doi.org/10.1007/3-540-45571-X_35
Published: 24 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics