ABSTRACT
The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggregate them in a single Web page. The web contents are often predefined content blocks provided by the service providers. However, it involves intensive manual efforts to define the content blocks and maintain the information in it. In this paper, we propose a novel personalized homepage system, called .Homepage Live., to allow end users to use drag-and-drop actions to collect their favorite Web content blocks from existing Web pages and organize them in a single page. Moreover, Homepage Live automatically traces the changes of blocks with the evolvement of the container pages by measuring the tree edit distance of the selected blocks. By exploiting the immutable elements of Web pages, the tracing algorithm performance is significantly improved. The experimental results demonstrate the effectiveness and efficiency of our algorithm.
- Ackerman, M., Starr, B. and Pazzani, M., The Do-I-Care Agent: Effective Social Discovery and Filtering on the Web. In Proceedings of RIAO'97, 17--31.Google Scholar
- Anderson, C. R. and Horvitz, E. Web montage: a dynamic personalized start page. In Proceedings of the Eleventh International Conference on World Wide Web, pages 704--712. ACM Press, 2002. Google ScholarDigital Library
- Boyapati, V., Chevrier, K., Finkel, A., Glance, N., Pierce, T., Stockton, R. and Whitmer, C. ChangeDetectorTM: A Site-Level Monitoring Tool for the WWW. In Proceedins of 11th International World Wide Web Conference (WWW 2002), 2002, 570--579. Google ScholarDigital Library
- Cai, D., Yu, S.P., Wen, J.R. and Ma, W.Y. Block-based Web search. In Proceedings of the 27th annual International Conference on Research and Development in Information Retrieval (SIGIR 2004), 2004, ACM Press, 456--463. Google ScholarDigital Library
- Cai, D., Yu, S.P., Wen, J.R. and Ma, W.Y. VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79, 2003.Google Scholar
- Chen, Y.F., Douglis, F., Huan, H. and Vo, K.P., TopBlend: An Efficient Implementation of HtmlDiff in Java. In Proceedings of the WebNet 2000 Conference, San Antonio, TX, Nov. 2000.Google Scholar
- Chen, J., Zhou, B., Shi, J., Zhang, H.J. and Qiu, F. Function-Based Object Model Towards Website Adaptation. In Proceedings of 10th International World Wide Web Conference (WWW 2001), 2001, 587--596. Google ScholarDigital Library
- Davulcu, H., Yang, G., Kifer, M., and Ramakrishnan, I. Computational aspects of resilient data extraction from semistructured sources. In 19th ACM Symposium on Principles of Database Systems, 136--144, 2000. Google ScholarDigital Library
- Douglis, F., Ball, T., Chen, Y., and Koutsofios, E. 1998. The AT&T Internet Difference Engine: Tracking and viewing changes on the web. World Wide Web 1, 1 (Jan. 1998), 27--44. Google ScholarDigital Library
- Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., and Robbins, D. C. 2003. Stuff I've seen: a system for personal information retrieval and re-use. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in informaion Retrieval (Toronto, Canada, July 28 -- August 01, 2003). SIGIR '03. ACM Press, New York, NY, 72--79. Google ScholarDigital Library
- Fishkin, K. and Bier, E., WebTracker -- a Web Service for tracking documents. In Proceedings of 6th International World Wide Web Conference (WWW 1997), 2004.Google Scholar
- Freire, J., Kumar, B., and Lieuwen, D. 2001. WebViews: accessing personalized web content and services. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 -- 05, 2001). WWW '01. ACM Press, New York, NY, 576--586. Google ScholarDigital Library
- Kovacevic, M., Diligenti, M., Gori, M., and Milutinovic, V. 2002. Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm'02) (December 09 -- 12, 2002). ICDM. IEEE Computer Society, Washington, DC, 250. Google ScholarDigital Library
- Lin, S.H. and Ho, J.M. Discovering Informative Content Blocks from Web Documents. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (SIGKDD 2002), 2002. Google ScholarDigital Library
- Liu, B., Grossman, R. and Zhai, Y. Mining Data Records in Web Pages. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD--2003), Washington, DC, USA, August 24 -- 27, 2003. Google ScholarDigital Library
- Ramaswamy, L., Lyengar, A., Liu, L. and Douglis, F. Automatic Detection of Fragments in Dynamically Generated Web Pages. In Proc. of 13th International World Wide Web Conference (WWW 2004), 2004, 443--454. Google ScholarDigital Library
- Song, R.H., Liu, H.F., Wen, J.R. and Ma, W.Y. Learning Block Importance Models for Web Pages. In Proceedings of 13th International World Wide Web Conference (WWW 2004), 2004, 203--211. Google ScholarDigital Library
- Sugiura,A., Koseki,Y. Internet Scrapbook: Automating Web Browsing Tasks by Demonstration. ACM Symposium on User Interface Software and Technology 1998: 9--18. Google ScholarDigital Library
- Tai. The Tree-to-Tree Correction Problem. J. ACM 26(3): 422--433 (1979). Google ScholarDigital Library
- Yu, S., Cai, D., Wen, J.R. and Ma, W.Y. Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation. In Proceedings of 12th International World Wide Web Conference (WWW 2003), 2003, 11--18. Google ScholarDigital Library
- Zhai, Y., and Liu, B. Web Data Extraction Based on Partial Tree Alignment, in Proceedings of the 14th international World Wide Web conference (WWW--2005), May 10--14, 2005, in Chiba, Japan. Google ScholarDigital Library
- Zhang, K., Statman, R. and Shasha, D. On the editing distance between unordered labeled trees. Information Processing Letters, 42(3):133--139, 1992. Google ScholarDigital Library
Index Terms
- Homepage live: automatic block tracing for web personalization
Recommendations
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05: Proceedings of the 14th international conference on World Wide WebWe describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify examples of semantic content by highlighting them in a web browser and ...
Structure-driven crawler generation by example
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalMany Web IR and Digital Library applications require a crawling process to collect pages with the ultimate goal of taking advantage of useful information available on Web sites. For some of these applications the criteria to determine when a page is to ...
Web montage: a dynamic personalized start page
WWW '02: Proceedings of the 11th international conference on World Wide WebDespite the connotation of the words "browsing" and "surfing," web usage often follows routine patterns of access. However, few mechanisms exist to assist users with these routine tasks; bookmarks or portal sites must be maintained manually and are ...
Comments