Abstract
This paper proposes a new method of building information extraction rules for Web documents by exploiting a user interface agent that combines the manual and automatic approaches of rule generation. We adopt the scheme of supervised learning in which the interface agent is designed to get information from the user regarding what to extract from a document and XML-based wrappers are generated according to these inputs. The interface agent is used not only to generate new extraction rules but also to modify and extend existing ones to enhance the precision and the recall measures of Web information extraction systems. We have done a series of experiments to test the system, and the results are very promising.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atzeni, P., Mecca, G., Merialdo, P.: Semi-structured and structured data in the Web: Going back and forth. In: Proc. ACM SIGMOD Workshop on Management of Semistructured Data, pp. 1–9 (1997)
Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the world wide web. In: Proc. Int. Conf. on Autonomous Agents, pp. 39–48 (1997)
Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M., Vassalos, V.: Template-based wrappers in the TSIMMIS system. In: Proc. ACMSIGMOD Int. Conf. on Management of Data, pp. 532–535 (1997)
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artif. Intell. 118, 15–68 (2000)
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. Int. Conf. on Autonomous Agents, pp. 190–197 (1999)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Yang, J., Lee, E., Choi, J.: A shopping agent that automatically constructs wrappers for semi-structured online vendors. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 368–373. Springer, Heidelberg (2000)
Yang, J., Choi, J.: Knowledge-based wrapper induction for intelligent Web information extraction. In: Zhong, N., Liu, J., Yao, Y. (eds.) Web Intelligence, pp. 153–172. Springer, Heidelberg (2003)
Yang, J., Choi, J.: Agents for intelligent information extraction by using domain knowledge and token-based morphological patterns. In: Lee, J.-H., Barley, M.W. (eds.) PRIMA 2003. LNCS (LNAI), vol. 2891, pp. 74–85. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, J., Kim, TH., Choi, J. (2005). An Interface Agent for Wrapper-Based Information Extraction. In: Barley, M.W., Kasabov, N. (eds) Intelligent Agents and Multi-Agent Systems. PRIMA 2004. Lecture Notes in Computer Science(), vol 3371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32128-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-32128-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25340-2
Online ISBN: 978-3-540-32128-6
eBook Packages: Computer ScienceComputer Science (R0)