skip to main content
10.1145/1600193.1600209acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Indexing by permeability in block structured web pages

Published: 16 September 2009 Publication History

Abstract

We present in this paper a model that we have developed for indexing and querying web pages based on their visual rendering. In this model pages are split up into a set of visual blocks. The indexing of a block takes into account its content, its visual importance and, by permeability, the indexing of neighbors blocks. A page is modeled as a directed acyclic graph. Each node is associated with a block and labeled by the coefficient of importance of this block. Each edge is labeled by the coefficient of permeability of the target node content to the source node content. Importance and permeability coefficients cannot be manually quantified. the second part of this paper, we present an experiment consisting in learning optimal permeability coefficients by gradient descent for indexing images of a web page from the text blocks of this page. The dataset is drawn from real web pages of the train and test set of the ImagEval task2 corpus. Results demonstrate an improvement of the indexing using non uniform block permeabilities.

References

[1]
E. Bruno, N. Faessel, J. Le Maitre, and M. Scholl. BlockWeb: an IR model for block structured web pages. In Proceedings of 7th International Workshop on Content Based Multimedia Indexing (CBMI 2009), pages 219--224, Chania, Crete, 2009.
[2]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In Proceedings of the 5th Asian-Pacific Web Conference (APWeb 2003), volume 2642, pages 406--417, Xian, China, 2003.
[3]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2000.
[4]
G. Nagy, S. C. Seth, and M. Viswanathan. A prototype document image analysis system for technical journals. IEEE Computer, 25(7):10--22, 1992.
[5]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975.
[6]
R. Song, H. Liu, J.-R. Wen, and W.-Y. Ma. Learning block importance models for web pages. In Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pages 203--211, Manhattan, NY, USA, 2004.
[7]
J. Zou, D. Le, and G. R. Thoma. Combining DOM tree and geometric layout analysis for online medical journal article segmentation. In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries pages 119--128, Chapel Hill, North Carolina, USA, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '09: Proceedings of the 9th ACM symposium on Document engineering
September 2009
264 pages
ISBN:9781605585758
DOI:10.1145/1600193
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. block importance
  2. block permeability
  3. content based image retrieval
  4. document indexing
  5. document retrieval

Qualifiers

  • Research-article

Conference

DocEng '09
DocEng '09: ACM Symposium on Document Engineering
September 16 - 18, 2009
Munich, Germany

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media