Skip to main content

Empirical Study on Usefulness of Algorithm SACwRApper for Reputation Extraction from the WWW

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Abstract

We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information [1,2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hasegawa, H., Kudo, M., Nakamura, A.: Reputation Extraction Using Both Structural and Content Information. Hokkaido university TCS Technical Report TCS-TR-A-05-2, http://www-alg.ist.hokudai.ac.jp/tra.html (2005)

  2. Hasegawa, H., Kudo, M., Nakamura, A.: Creation of Better Pattern Set for Reputation Extraction Using Both Structural and Content Information. In: Proc. of Data Engineering Workshop (2005) ) (in Japanese) (ISSN 1347-4413)

    Google Scholar 

  3. Kushmerick, N.: Wrapper Induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  4. Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan(IPSJ) Transactions on Mathematical Modeling and its Applications(TOM) 42(SIG 14(TOM5), 39–49 (2001) (in Japanese)

    Google Scholar 

  5. Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. 11th Int’l World Wide Web Conf., pp. 232–241 (2002)

    Google Scholar 

  6. Tateishi, K., Ishiguro, Y., Fukushima, T.: A Reputation Search Engine that Collects People’s Opinions by Information Extraction Technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004)

    Google Scholar 

  7. Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proc. SIGKDD 2002 (2002)

    Google Scholar 

  8. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. SIGKDD 2004 (2004)

    Google Scholar 

  9. Chang, C.H., Lui, S.C.: Iepad: Information extraction based on pattern discovery. In: Proc. 10th Int’l World Wide Web Conf., pp. 4–15 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hasegawa, H., Kudo, M., Nakamura, A. (2005). Empirical Study on Usefulness of Algorithm SACwRApper for Reputation Extraction from the WWW. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_93

Download citation

  • DOI: https://doi.org/10.1007/11554028_93

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28897-8

  • Online ISBN: 978-3-540-31997-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics