Abstract
We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information [1,2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hasegawa, H., Kudo, M., Nakamura, A.: Reputation Extraction Using Both Structural and Content Information. Hokkaido university TCS Technical Report TCS-TR-A-05-2, http://www-alg.ist.hokudai.ac.jp/tra.html (2005)
Hasegawa, H., Kudo, M., Nakamura, A.: Creation of Better Pattern Set for Reputation Extraction Using Both Structural and Content Information. In: Proc. of Data Engineering Workshop (2005) ) (in Japanese) (ISSN 1347-4413)
Kushmerick, N.: Wrapper Induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan(IPSJ) Transactions on Mathematical Modeling and its Applications(TOM) 42(SIG 14(TOM5), 39–49 (2001) (in Japanese)
Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. 11th Int’l World Wide Web Conf., pp. 232–241 (2002)
Tateishi, K., Ishiguro, Y., Fukushima, T.: A Reputation Search Engine that Collects People’s Opinions by Information Extraction Technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004)
Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proc. SIGKDD 2002 (2002)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. SIGKDD 2004 (2004)
Chang, C.H., Lui, S.C.: Iepad: Information extraction based on pattern discovery. In: Proc. 10th Int’l World Wide Web Conf., pp. 4–15 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hasegawa, H., Kudo, M., Nakamura, A. (2005). Empirical Study on Usefulness of Algorithm SACwRApper for Reputation Extraction from the WWW. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_93
Download citation
DOI: https://doi.org/10.1007/11554028_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28897-8
Online ISBN: 978-3-540-31997-9
eBook Packages: Computer ScienceComputer Science (R0)