Abstract
Web pages contain information in several forms. These include textual information such as words and visual information such as images, use of color, and layout. We propose a method of extracting the characteristic features from both the textual and visual information in Web pages. Our method enables seamless integration of the two types of information and automatic extraction of their characteristic features. Based on this method, we developed a proof-of-concept system called Robin, which is designed to provide users with an intuitive way of browsing search engine results. The results of an experimental evaluation of the system showed that it has the potential to be practical and effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Triesman, A., Gormican, S.: Feature analysis in early vision: Evidence from search asymmetries. Psychological Review 95(1), 15–48 (1988)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Jolliffe, I.T.: Principal Component Analysis. Springer series in statistics (2002)
Google, http://www.google.com
Thumbshot.org, http://www.thumbshots.org
Theodoridis, S., Koutrounmbas, K. (eds.): Pattern Recognition. Academic Press, London (1999)
WebBrain, http://www.webbrain.com
Claffy, K., Huffaker, B.: Macroscopic Internet visualization and measurement, http://www.caida.org/Tools/Mapnet/summary.html
Open Directory Project, http://dmoz.org
Kartoo, http://www.kartoo.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oka, M., Tsukada, H., Kato, K. (2006). Robin: Extracting Visual and Textual Features from Web Pages. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_71
Download citation
DOI: https://doi.org/10.1007/11610113_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)