ParaSite: mining structural information on the Web

https://doi.org/10.1016/S0169-7552(97)00033-0Get rights and content

Abstract

Web information retrieval tools typically make use of only the text on pages, ignoring valuable information implicitly contained in links. At the other extreme, viewing the Web as a traditional hypertext system would also be mistake, because heterogeneity, cross-domain links, and the dynamic nature of the Web mean that many assumptions of typical hypertext systems do not apply. The novelty of the Web leads to new problems in information access, and it is necessary to make use of the new kinds of information available, such as multiple independent categorization, naming, and indexing of pages. This paper discusses the varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications. Specific applications presented as part of the ParaSite system find individuals' homepages, new locations of moved pages, and unindexed information.

References (23)

  • H.P. Frei et al.

    The use of semantic links in hypertext information retrieval

  • U. Shardanand et al.

    Social information filtering: algorithms for automating “Word of Mouth”

  • K. Andrews

    Applying hypermedia research to the World Wide Web

  • S.B. Shum

    The missing link: hypermedia usability research & the Web

  • P. Pirolli et al.

    Silk from a sow's ear: extracting usable structures from the Web

  • G.A. Miller

    Word-Net: A lexical database for English

  • E. Spertus

    Information hierarchies

  • E. Selberg et al.

    Multi-service search and comparison using the MetaCrawler

  • W.A. Woods

    What's in a link: foundations for semantic networks

  • J. Nanard et al.

    Should anchors be typed too? An experiment with MacWeb

  • R.H. Trigg

    A network-based approach to text handling for the online scientific community

  • Cited by (40)

    • Unveiling the I2P web structure: A connectivity analysis

      2021, Computer Networks
      Citation Excerpt :

      The Surface Web or simply the Web, has been widely studied by the research community with different and heterogeneous aims. A number of works are focused on different topics, such as performance optimization for search engines [18–21], analysis of connectivity, dimension and structure of the sites [22–25] and classification of the sites and their contents [26–28]. As we will see in the following, works on the Deep Web and darknets have also addressed similar topics than in the Surface Web.

    • Identifying website communities in mobile internet based on affinity measurement

      2014, Computer Communications
      Citation Excerpt :

      It is important and interesting to discover these a priori unknown community structures in graphs. The purposes of initial works on investigating the community structures are for structure visualizing [9,10] and content searching [11,12] in the web environment. Subsequently, there have been a growing number of works directed at revealing community structures based on various context information.

    • Social Semantic Web Mining

      2015, Synthesis Lectures on the Semantic Web: Theory and Technology
    View all citing articles on Scopus
    View full text