Skip to main content

Using Random Walks for Mining Web Document Associations

  • Conference paper
  • First Online:
Knowledge Discovery and Data Mining. Current Issues and New Applications (PAKDD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Abstract

World Wide Web has emerged as a primary means for storing and structuring information. In this paper, we present a framework for mining implicit associations among Web documents. We focus on the following problem: “For a given set of seed URLs, find a list of Web pages which reflect the association among these seeds.” In the proposed framework, associations of two documents are induced by the connectivity and linking path length. Based on this framework, we have developed a random walk-based Web mining technique and validated it by experiments on real Web data. In this paper, we also discuss the extension of the algorithm for considering document contents.

This work was performed when the author visited NEC, CCRL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jeffrey Dean and Monika Henzinger. Finding Related Pages in the World Wide Web. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.

    Google Scholar 

  2. Netscape Communications Corporation. What’s Related web page. Information available at http://home.netscape.com/netscapes/related/’faq.html .

  3. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 668–677, January 1998.

    Google Scholar 

  4. Wen-Syan Li and Selcuk Candan. Integrating Content Search with Structure Analysis for Hypermedia Retrieval and Management. To appear in ACM Computing Survey, 2000.

    Google Scholar 

  5. Frank K. Hwang, Dana S. Richards, and Pawel Winter, editors. The Steiner Tree Problem (Annals of Discrete Mathematics, Vol 53). 1992.

    Google Scholar 

  6. S.L. Hakimi. Steiner’s problem in graphs and its implications. Networks, 1:113–131, 1971.

    Article  MATH  MathSciNet  Google Scholar 

  7. Krishna Bharat and Monika Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21th Annual International ACM SIGIR Conference, pages 104–111, Melbourne, Australia, August 1998.

    Google Scholar 

  8. Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In Proceedings of the 7th World-Wide Web Conference, pages 65–74, Brisbane, Queensland, Australia, April 1998.

    Google Scholar 

  9. Lawrence Page and Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th World-Wide Web Conference, Brisbane, Queensland, Australia, April 1998.

    Google Scholar 

  10. David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan. Inferring Web Communities from Link Topology. In Proceedings of the 1998 ACM Hypertext Conference, pages 225–234, Pittsburgh, PA, USA, June 1998.

    Google Scholar 

  11. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the Web for Emerging Cyber-Communities. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.

    Google Scholar 

  12. Krishna Bharat and Andrei Z. Broder. Mirror, Mirror, on the Web: A Study of Host Pairs with Replicated Content. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Selçuk Candan, K., Li, WS. (2000). Using Random Walks for Mining Web Document Associations. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-45571-X_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67382-8

  • Online ISBN: 978-3-540-45571-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics