ABSTRACT
The Site Browser endeavors to build an overview browsing system for the entire Web. Overview browsing represents an alternative to the search-based view of information work, and does so by providing a consistent set of summary views which can be browsed interactively. The views partition and linearize the corpus for ready understanding and exploration. They show a web site's relation to other sites, the broad nature of the information it contains and how it is structured, and how it has changed over time. The design challenge is to generate useful summary information in a process which is fast enough to be updated daily. Our current system maintains a continuously updated archive of 46 million sites representing 2.3 billion web pages.
- Alexa. http://www.alexa.com.Google Scholar
- E. Amitay, D. Carmel, A. Darlow, R. Lempel, and A. Soffer. The Connectivity Sonar: detecting site functionality by structural patterns. In Proceedings of ACM Hypertext '03, pages 38--47. ACM Press, 2003. Google ScholarDigital Library
- Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), 2002. Google ScholarDigital Library
- T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.Google ScholarCross Ref
- L. Y. Bing~Liu, Kaidi~Zhao. Visualizing web site comparisons. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), pages 693--703, 2002. Google ScholarDigital Library
- V. Boyapati, K. Chevrier, A. Finkel, N. Glance, T. Pierce, R. Stockton, and C. Whitmer. ChangeDetector{tm}: a site-level monitoring tool for the WWW. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), pages 570--579, 2002. Google ScholarDigital Library
- S. Brin, R. Motwani, L. Page, and T. Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21(2):37--47, 1998.Google Scholar
- V. Bush. As we may think. The Atlantic Monthly, July 1945.Google Scholar
- J. Cho and S. Roy. Impact of web search engines on page popularity. In Proceedings of the 13th International World Wide Web Conference (WWW2004), 2004. Google ScholarDigital Library
- P. Dave, U. P. Karadkar, R. Furuta, L. Francisco-Revilla, F. Shipman, S. Dash, and Z. Dalal. Browsing intricately interconnected paths. In Proceedings of ACM Hypertext '03, pages 95--103. ACM Press, 2003. Google ScholarDigital Library
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In Proceedings of the 12th International World Wide Web Conference (WWW2003), May 2003. Google ScholarDigital Library
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Seeker: An architecture for web-scale text analytics. Technical Report RJ 10233 (95107), IBM Research, February 2002.Google Scholar
- N. Eiron and K. S. McCurley. Untangling compound documents on the web. In Proceedings of ACM Hypertext '03, 2003. Google ScholarDigital Library
- T. Haveliwala. Efficient encodings for document ranking vectors. In International Conference on Internet Computing, 2003.Google Scholar
- M. Hearst. User interfaces and visualization. In R. Baeza-Yates and B. Ribeiro-Neto (Eds.) Modern information retrieval. NY: ACM Press., 1999.Google Scholar
- Y. Maarek and I. Shaul. Webcutter: A system for dynamic and tailorable site mapping. In Proceedings of the 6th International World Wide Web Conference, 1997. Google ScholarDigital Library
- G. Marchionini and B. Brunk. Toward a general relation browser: A GUI for information architects. In Journal of Digital Information, volume 4, 2003.Google Scholar
- K. S. McCurley. Geospatial mapping and navigation of the web. In Proceedings of the 10th International World Wide Web Conference (WWW2001), pages 221--229, Hong Kong, China, 2001. Google ScholarDigital Library
- D. Nation, C. Plaisant, G. Marchionini, and A. Komlodi. Visualizing websites using a hierarchical table of contents browser: WebTOC. In Designing for the Web: Practices and Reflections, 1997.Google Scholar
- D. Quan and D. Karger. How to make a semantic web browser. In Proceedings of the 13th International World Wide Web Conference (WWW2004), 2004. Google ScholarDigital Library
- A. J. Sellen, R. Murphy, and K. L. Shaw. How knowledge workers use the web. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 227--234. ACM Press, 2002. Google ScholarDigital Library
- J. Teevan, C. Alvarado, M. S. Ackerman, and D. R. Karger. The perfect search engine is not enough: A study of orienteering behavior in directed search. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press, 2004. Google ScholarDigital Library
- K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 401--408. ACM Press, 2003. Google ScholarDigital Library
Index Terms
- The site browser: catalyzing improvements in hypertext organization
Recommendations
Web site metadata
The currently established formats for how a Web site can publish metadata about a site's pages, the robots.txt file and sitemaps, focus on how to provide information to crawlers about where to not go and where to go on a site. This is sufficient as ...
The visible Web browser
As an aid to the study of the World-Wide Web, we have developed a software application that allows a user to observe the messages passed between a Web browser and a Web server. The application is based on the Mozilla Web Browser, and displays the HTTP ...
Comments