Abstract
Since the growth of the Internet,World Wide Web has become significant infrastructure in various fields such as business, commerce, education and so on. Accordingly, a user has gathered information by using the Internet. However due to increasing Web pages, it becomes difficult for a user to collect desirable information. Advanced Web search engines may provide solution to some extent, it is still up to a user to summarize or extract meaningful information from such retrieval results. Based on this viewpoints, this paper addresses a generation method of table-style data from heterogeneous Web pages that reflects a user’s intention. To achieve it, the method utilize a user’s instantiated example in a table in addition to column labels as the table. Based on a user’s instantiated example, meaningful information are extracted using pattern matching and N-gram method. We apply this method to 57 pages with 27 travel agencies whether the proposed method is effective or not. As the result, 88% was precision rate and 68% was recall rate.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Okamura, H., Miyauchi, S., Dohi, T.: A Web Page Ranking Algorithm Based on a Markov Decision Process. The IEICE transitions on information and systems(Japanese edition) J89-D(2), 210–219 (2006)
Aratani, H., Fujita, S., Sugawara, K.: Extremely Precise Finding Methodology on the Mutual Evolution Method Among Web Pages, IEICE technical report, Artificial Intelligence and Knowledge-based Processing 105(105), 1–6 (May 2005)
Kawamae, N., Aoki, T., Yasuda, H.: Page Ranking Method of Search System Considering Difference of Access to the Pages. In: Proc. of the IEICE General Conference, vol. (1), p. 47 (March 2000)
Watanabe, N., Okamoto, M., Kikuchi, M., Iida, T., Hattori, M.: Influence of Presentation Style in Web-Search Result Recommenndation, IPSJ SIG technical reports 2009 (28), 61-68 (March 2009)
Toda, H., Yasuda, N., Okumura, M., Matsuura, Y., Kataoka, R.: Snippet Generation for Geographic Information Retrieval. Transition of the Japanese Society for Artificial Intelligence 24(6), 494–506 (2009)
Muramatsu, R., Yokoyama, S., Fukuta, N., Ishikawa, H.: Architect Snippets with Harmonized Various Viewpoint about Search Result Cluster with Consideration of Word’s Characteristic Volume. SIG Notes 2008 88, 301–306 (2008)
Sakai, H., Masuyama, S.: A Multiple-Document Summarization System Introducing User Interaction for Reflecting User’s Summarization Needs. Journal of Japan Society for Fuzzy Theory and Intelligence Informatics 18(2), 265–279 (2006)
Aratani, H., Fujita, S., Sugawara, K.: Improvement of a Re-ranking Method for Web Search Based on Mutual Evaliation among Web Pages. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 18(2), 196–212 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shimada, J., Oka, H., Akiyoshi, M., Komoda, N. (2010). Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User’s Instantiated Example. In: de Leon F. de Carvalho, A.P., Rodríguez-González, S., De Paz Santana, J.F., Rodríguez, J.M.C. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 79. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14883-5_74
Download citation
DOI: https://doi.org/10.1007/978-3-642-14883-5_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14882-8
Online ISBN: 978-3-642-14883-5
eBook Packages: EngineeringEngineering (R0)