Abstract
Images on the Web appear with other textual contents—referred to as Web Image Context —providing valuable information to the image semantics. Unfortunately, HTML documents are usually cluttered with multiple different contents to different topics and therefore the right image context needs to be precisely determined in order to deliver high quality descriptions. Several methods that automatically determine and extract the Web image context from Web documents have been applied in different applications over the years. However, in these applications context extraction is only a preprocessing step and therefore the quality of the extraction task has rather been evaluated on its own. To sum up, there is hardly information about which extraction method to choose in order to get best results. Keeping this necessity in mind, an evaluation framework that objectively measures and compares the quality of different Web Image Context Extraction (WICE) algorithms will be the main subject in this book chapter. The main parts of the framework are a large ground truth dataset consisting of diverse Web documents from real Web servers and objective quality measures tailored to fit the special characteristics of the image context extraction task. In order to demonstrate the capabilities of the framework, common extraction methods from the literature are implemented and integrated into the framework. Finally, the evaluation results are summarized and discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gudivada VN, Raghavan VV (1995) Content-based image retrieval systems. Computer 28:18–22
Alcic S, Conrad S (2010) Measuring performance of web image context extraction. In: Proceedings of the 10th international workshop on multimedia data mining, MDMKDD’10, ACM, New York, pp 8:1–8:8
Alcic S, (2011) Web image context extraction: methods and evaluation. PhD thesis, Heinrich-Heine-University of Duesseldorf
Alcic S, Conrad S (2011) Page segmentation by web content clustering. In: International conference on web intelligence, mining and semantics (WIMS11), May 2011
Coelho TAS, Calado PP, Souza LV, Ribeiro-Neto B, Muntz R (2004) Image retrieval using multiple evidence ranking. IEEE Trans Knowl Data Eng 16(4):408–417
Sclaroff S, Taycher L, Cascia ML (1997) Imagerover: a content-based image browser for the world wide web. In: Proceedings of the 1997 workshop on content-based access of image and video libraries (CBAIVL’97), CAIVL’97, IEEE Computer Society, Washington
Vasconcelos N, Lippman A (2000) Bayesian relevance feedback for content-based image retrieval. In: Proceedings of the IEEE workshop on content-based access of image and video libraries (CBAIVL’00), IEEE Computer Society, Washington, p 63
Yong-hong T, Tie-jun H, Wen G (2005) Exploiting multi-context analysis in semantic image classification. J Zhejiang Univ Sci, 1268–1283
Cai D, Yu S, Wen J-R, Ma W-Y (2003) VIPS: a vision-based page segmentation algorithm. Technical report, Microsoft Research (MSR-TR-2003-79)
He X, Cai D, Wen J-R, Ma W-Y, Zhang H-J (2007) Clustering and searching WWW images using link and page layout analysis. ACM Trans Multimed Comput Commun Appl 3(2):10
Fauzi F, Hong J-L, Belkhatir M (2009) Webpage segmentation for extracting images and their surrounding contextual information. In: ACM multimedia, pp 649–652
Alexa (2011) The web information company. http://www.alexa.com
Cai D, He X, Li Z, Ma W-Y, Wen J-R (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA’04, New York, pp 952–959
Sandor A, Tripp A, Giustina F, Peskin GL, Lempinen S, Gold R, Sanders J,Yount S (2011) The homepage of the JTidy java API. http://jtidy.sourceforge.net/
Feng H, Shi R, Chua T-S (2004) A bootstrapping framework for annotating and retrieving WWW images. In: Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA’04, ACM, New York, pp 960–967
Trifonova G (2010) Implementation of a tool for manual web image to context mapping. Bachelor Thesis, September 2010
Sclaroff S, Cascia ML, Sethi S (1999) Unifying textual and visual cues for content-based image retrieval on the World Wide Web. Comput Vis Image Underst 75(1–2):86–98
Zhigang H, Xiang-Jun W, Qingshan L, Hanqing L (2005) Semantic knowledge extraction and annotation for web images. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA’05, ACM, New York, pp 467–470
Frankel C, Swain MJ, Athitsos V (1996) WebSeer: an image search engine for the world wide web. Technical Report, University of Chicago, Chicago
Shen HT, Ooi BC, Tan K-L (2000) Giving meanings to WWW images. In: Proceedings of the eighth ACM international conference on multimedia, MULTIMEDIA’00, ACM, New York, pp 39–47
Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Data-centric systems and applications. Springer, Berlin
Cai D (2011) Download site of the DEMO of VIPS algorithm. http://www.zjucadcg.cn/dengcai/VIPS/VIPS.html
Ortega-Binderberger M, Mehrotra S, Chakrabarti K, Porkaew K (2000) WebMARS: a multimedia search engine for the world wide web. In: Proceedings of the SPIE electronic imaging 2000: internet imaging, San Jose
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Alcic, S., Conrad, S. (2015). Evaluating Web Image Context Extraction. In: Baughman, A., Gao, J., Pan, JY., Petrushin, V. (eds) Multimedia Data Mining and Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-14998-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-14998-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14997-4
Online ISBN: 978-3-319-14998-1
eBook Packages: Computer ScienceComputer Science (R0)