Empirical Study on Usefulness of Algorithm SACwRApper for Reputation Extraction from the WWW

Hasegawa, Hiroyuki; Kudo, Mineichi; Nakamura, Atsuyoshi

doi:10.1007/11554028_93

Hiroyuki Hasegawa²¹,
Mineichi Kudo²¹ &
Atsuyoshi Nakamura²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3684))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1196 Accesses

Abstract

We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information [1,2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic generation of entity-oriented summaries for reputation management

Article 26 February 2019

Multi-dimensional Reputation Modeling Using Micro-blog Contents

Integrating learned and explicit document features for reputation monitoring in social media

Article 19 July 2019

References

Hasegawa, H., Kudo, M., Nakamura, A.: Reputation Extraction Using Both Structural and Content Information. Hokkaido university TCS Technical Report TCS-TR-A-05-2, http://www-alg.ist.hokudai.ac.jp/tra.html (2005)
Hasegawa, H., Kudo, M., Nakamura, A.: Creation of Better Pattern Set for Reputation Extraction Using Both Structural and Content Information. In: Proc. of Data Engineering Workshop (2005) ) (in Japanese) (ISSN 1347-4413)
Google Scholar
Kushmerick, N.: Wrapper Induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Article MATH MathSciNet Google Scholar
Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan(IPSJ) Transactions on Mathematical Modeling and its Applications(TOM) 42(SIG 14(TOM5), 39–49 (2001) (in Japanese)
Google Scholar
Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. 11th Int’l World Wide Web Conf., pp. 232–241 (2002)
Google Scholar
Tateishi, K., Ishiguro, Y., Fukushima, T.: A Reputation Search Engine that Collects People’s Opinions by Information Extraction Technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004)
Google Scholar
Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proc. SIGKDD 2002 (2002)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. SIGKDD 2004 (2004)
Google Scholar
Chang, C.H., Lui, S.C.: Iepad: Information extraction based on pattern discovery. In: Proc. 10th Int’l World Wide Web Conf., pp. 4–15 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, 060-0814, Japan
Hiroyuki Hasegawa, Mineichi Kudo & Atsuyoshi Nakamura

Authors

Hiroyuki Hasegawa
View author publications
You can also search for this author in PubMed Google Scholar
Mineichi Kudo
View author publications
You can also search for this author in PubMed Google Scholar
Atsuyoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, La Trobe University, 3086, Melbourne, Victoria, Australia
Rajiv Khosla
Centre for SMART systems Engineering Research Centre, University of Brighton, Moulsecoomb, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasegawa, H., Kudo, M., Nakamura, A. (2005). Empirical Study on Usefulness of Algorithm SACwRApper for Reputation Extraction from the WWW. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_93

Download citation

DOI: https://doi.org/10.1007/11554028_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28897-8
Online ISBN: 978-3-540-31997-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics