Robin: Extracting Visual and Textual Features from Web Pages

Oka, Mizuki; Tsukada, Hiroshi; Kato, Kazuhiko

doi:10.1007/11610113_71

Mizuki Oka²¹,
Hiroshi Tsukada²¹ &
Kazuhiko Kato^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Asia-Pacific Web Conference

616 Accesses

Abstract

Web pages contain information in several forms. These include textual information such as words and visual information such as images, use of color, and layout. We propose a method of extracting the characteristic features from both the textual and visual information in Web pages. Our method enables seamless integration of the two types of information and automatic extraction of their characteristic features. Based on this method, we developed a proof-of-concept system called Robin, which is designed to provide users with an intuitive way of browsing search engine results. The results of an experimental evaluation of the system showed that it has the potential to be practical and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Triesman, A., Gormican, S.: Feature analysis in early vision: Evidence from search asymmetries. Psychological Review 95(1), 15–48 (1988)
Article Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer series in statistics (2002)
Google Scholar
Google, http://www.google.com
Thumbshot.org, http://www.thumbshots.org
Theodoridis, S., Koutrounmbas, K. (eds.): Pattern Recognition. Academic Press, London (1999)
Google Scholar
WebBrain, http://www.webbrain.com
Claffy, K., Huffaker, B.: Macroscopic Internet visualization and measurement, http://www.caida.org/Tools/Mapnet/summary.html
Open Directory Project, http://dmoz.org
Kartoo, http://www.kartoo.com

Download references

Author information

Authors and Affiliations

University of Tsukuba,
Mizuki Oka, Hiroshi Tsukada & Kazuhiko Kato
Japan Science and Technology Agency,
Kazuhiko Kato

Authors

Mizuki Oka
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Tsukada
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiko Kato
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oka, M., Tsukada, H., Kato, K. (2006). Robin: Extracting Visual and Textual Features from Web Pages. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_71

Download citation

DOI: https://doi.org/10.1007/11610113_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics