skip to main content
10.1145/1860559.1860610acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
poster

Style and branding elements extraction from businessweb sites

Published:21 September 2010Publication History

ABSTRACT

We describe a method to extract style and branding elements from multiple web pages in a given site for content repurposing. Style and branding elements convey the values of the site owners effectively and connect with the target prospects. They are manifested through logos, graphical elements, background color, font styles, font colors and other illustrations. Our method automatically extracts color and image elements appearing frequently and prominently on multiple pages throughout the site. We rely on a DOM tree matching method to obtain the frequency of re-occurring elements and use relative sizes and positions of elements to determine the type of elements. Note that approximate locations of these elements provide an added clue to the content repurposing engine as to where to place the elements in the repurposed document. The obtained results show that the proposed method can efficiently extract style and branding elements with high accuracy.

References

  1. }}Seungyup Paek and John R. Smith, "Detecting Image Purpose in World-wide Web Documents," Proc. SPIE Symp. Electronic Imaging-Document Recognition, SPIE, Bellingham, Washington, Jan. 1998.Google ScholarGoogle Scholar
  2. }}Epimenides Voutsakis, Euripides G.M. Petrakis, and Evangelos Milios. Weighted Link Analysis for Logo and Trademark Image Retrieval on the Web. In Proc. IEEE/WIC/ACM Intern. Conf. on Web Intelligence (WI2005), pages 581--585, Compiegne, France, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Euripides G.M. Petrakis, epimenides Voutsakis and Evangelos e. Milios. Searching for Logo and Trademark Images on the Web. CIVR'07, July 9--11, 2007, Amsterdam, The Netherlands. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Euripides G.M. Petrakis, Klaydios Kontis and Epimenidis Voutsakis. Relevance Feedback Methods for Logo and Trademark Image Retrieval on the Web. SAC'06 April 23--27, 2006, Dijon, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Subhajit Sanyal, S. H. Srinivasan. LogoSeeker: A System for Detecting and Matching Logos in Natural Images. MM'07, September 23--28, 2007, Augsburg, Bavaria, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}W. Yang, 1991 Identifying Syntactic Differences between Two Programs. Software-Practice and Experience, vol. 21, no. 7, pp. 739--755, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Davi de Castro Reis, Paulo B. Golgher, Altigran S. da Silva, Alberto H. F. Laender. 2004 Automatic Web News Extraction Using Tree Edit Distance. WWW2004, May 17--22, 2004, New York, USA.Google ScholarGoogle Scholar
  8. }}Yanhong Zhai, BingLiu. Web Data Extraction based on Partial Tree Alignment. WWW05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Shuyi Zheng, Di Wu, Ruihua Song, Ji-rong Wen, Joint Optimization of Wrapper Generation and Template Detection, KDD'07 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Michael S. Lew, Nicu Sebe, Chabane Djeraba, Ramesh Jain. Content-Based Multimedia Information Retrieval: State of the Art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 1, February 2006, Pages 1--19 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Style and branding elements extraction from businessweb sites

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DocEng '10: Proceedings of the 10th ACM symposium on Document engineering
        September 2010
        298 pages
        ISBN:9781450302319
        DOI:10.1145/1860559

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 September 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        DocEng '10 Paper Acceptance Rate13of42submissions,31%Overall Acceptance Rate178of537submissions,33%
      • Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader