skip to main content
10.1145/1242572.1242574acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Homepage live: automatic block tracing for web personalization

Authors Info & Claims
Published:08 May 2007Publication History

ABSTRACT

The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggregate them in a single Web page. The web contents are often predefined content blocks provided by the service providers. However, it involves intensive manual efforts to define the content blocks and maintain the information in it. In this paper, we propose a novel personalized homepage system, called .Homepage Live., to allow end users to use drag-and-drop actions to collect their favorite Web content blocks from existing Web pages and organize them in a single page. Moreover, Homepage Live automatically traces the changes of blocks with the evolvement of the container pages by measuring the tree edit distance of the selected blocks. By exploiting the immutable elements of Web pages, the tracing algorithm performance is significantly improved. The experimental results demonstrate the effectiveness and efficiency of our algorithm.

References

  1. Ackerman, M., Starr, B. and Pazzani, M., The Do-I-Care Agent: Effective Social Discovery and Filtering on the Web. In Proceedings of RIAO'97, 17--31.Google ScholarGoogle Scholar
  2. Anderson, C. R. and Horvitz, E. Web montage: a dynamic personalized start page. In Proceedings of the Eleventh International Conference on World Wide Web, pages 704--712. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boyapati, V., Chevrier, K., Finkel, A., Glance, N., Pierce, T., Stockton, R. and Whitmer, C. ChangeDetectorTM: A Site-Level Monitoring Tool for the WWW. In Proceedins of 11th International World Wide Web Conference (WWW 2002), 2002, 570--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cai, D., Yu, S.P., Wen, J.R. and Ma, W.Y. Block-based Web search. In Proceedings of the 27th annual International Conference on Research and Development in Information Retrieval (SIGIR 2004), 2004, ACM Press, 456--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cai, D., Yu, S.P., Wen, J.R. and Ma, W.Y. VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79, 2003.Google ScholarGoogle Scholar
  6. Chen, Y.F., Douglis, F., Huan, H. and Vo, K.P., TopBlend: An Efficient Implementation of HtmlDiff in Java. In Proceedings of the WebNet 2000 Conference, San Antonio, TX, Nov. 2000.Google ScholarGoogle Scholar
  7. Chen, J., Zhou, B., Shi, J., Zhang, H.J. and Qiu, F. Function-Based Object Model Towards Website Adaptation. In Proceedings of 10th International World Wide Web Conference (WWW 2001), 2001, 587--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Davulcu, H., Yang, G., Kifer, M., and Ramakrishnan, I. Computational aspects of resilient data extraction from semistructured sources. In 19th ACM Symposium on Principles of Database Systems, 136--144, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Douglis, F., Ball, T., Chen, Y., and Koutsofios, E. 1998. The AT&T Internet Difference Engine: Tracking and viewing changes on the web. World Wide Web 1, 1 (Jan. 1998), 27--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., and Robbins, D. C. 2003. Stuff I've seen: a system for personal information retrieval and re-use. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in informaion Retrieval (Toronto, Canada, July 28 -- August 01, 2003). SIGIR '03. ACM Press, New York, NY, 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fishkin, K. and Bier, E., WebTracker -- a Web Service for tracking documents. In Proceedings of 6th International World Wide Web Conference (WWW 1997), 2004.Google ScholarGoogle Scholar
  12. Freire, J., Kumar, B., and Lieuwen, D. 2001. WebViews: accessing personalized web content and services. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 -- 05, 2001). WWW '01. ACM Press, New York, NY, 576--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kovacevic, M., Diligenti, M., Gori, M., and Milutinovic, V. 2002. Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm'02) (December 09 -- 12, 2002). ICDM. IEEE Computer Society, Washington, DC, 250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lin, S.H. and Ho, J.M. Discovering Informative Content Blocks from Web Documents. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (SIGKDD 2002), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Liu, B., Grossman, R. and Zhai, Y. Mining Data Records in Web Pages. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD--2003), Washington, DC, USA, August 24 -- 27, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ramaswamy, L., Lyengar, A., Liu, L. and Douglis, F. Automatic Detection of Fragments in Dynamically Generated Web Pages. In Proc. of 13th International World Wide Web Conference (WWW 2004), 2004, 443--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Song, R.H., Liu, H.F., Wen, J.R. and Ma, W.Y. Learning Block Importance Models for Web Pages. In Proceedings of 13th International World Wide Web Conference (WWW 2004), 2004, 203--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sugiura,A., Koseki,Y. Internet Scrapbook: Automating Web Browsing Tasks by Demonstration. ACM Symposium on User Interface Software and Technology 1998: 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tai. The Tree-to-Tree Correction Problem. J. ACM 26(3): 422--433 (1979). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yu, S., Cai, D., Wen, J.R. and Ma, W.Y. Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation. In Proceedings of 12th International World Wide Web Conference (WWW 2003), 2003, 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhai, Y., and Liu, B. Web Data Extraction Based on Partial Tree Alignment, in Proceedings of the 14th international World Wide Web conference (WWW--2005), May 10--14, 2005, in Chiba, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhang, K., Statman, R. and Shasha, D. On the editing distance between unordered labeled trees. Information Processing Letters, 42(3):133--139, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Homepage live: automatic block tracing for web personalization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '07: Proceedings of the 16th international conference on World Wide Web
          May 2007
          1382 pages
          ISBN:9781595936547
          DOI:10.1145/1242572

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 May 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

          Upcoming Conference

          WWW '24
          The ACM Web Conference 2024
          May 13 - 17, 2024
          Singapore , Singapore

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader