Skip to main content

An Interface Agent for Wrapper-Based Information Extraction

  • Conference paper
Intelligent Agents and Multi-Agent Systems (PRIMA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3371))

Included in the following conference series:

  • 407 Accesses

Abstract

This paper proposes a new method of building information extraction rules for Web documents by exploiting a user interface agent that combines the manual and automatic approaches of rule generation. We adopt the scheme of supervised learning in which the interface agent is designed to get information from the user regarding what to extract from a document and XML-based wrappers are generated according to these inputs. The interface agent is used not only to generate new extraction rules but also to modify and extend existing ones to enhance the precision and the recall measures of Web information extraction systems. We have done a series of experiments to test the system, and the results are very promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Atzeni, P., Mecca, G., Merialdo, P.: Semi-structured and structured data in the Web: Going back and forth. In: Proc. ACM SIGMOD Workshop on Management of Semistructured Data, pp. 1–9 (1997)

    Google Scholar 

  2. Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the world wide web. In: Proc. Int. Conf. on Autonomous Agents, pp. 39–48 (1997)

    Google Scholar 

  3. Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M., Vassalos, V.: Template-based wrappers in the TSIMMIS system. In: Proc. ACMSIGMOD Int. Conf. on Management of Data, pp. 532–535 (1997)

    Google Scholar 

  4. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artif. Intell. 118, 15–68 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  5. Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. Int. Conf. on Autonomous Agents, pp. 190–197 (1999)

    Google Scholar 

  6. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)

    Article  MATH  Google Scholar 

  7. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  8. Yang, J., Lee, E., Choi, J.: A shopping agent that automatically constructs wrappers for semi-structured online vendors. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 368–373. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Yang, J., Choi, J.: Knowledge-based wrapper induction for intelligent Web information extraction. In: Zhong, N., Liu, J., Yao, Y. (eds.) Web Intelligence, pp. 153–172. Springer, Heidelberg (2003)

    Google Scholar 

  10. Yang, J., Choi, J.: Agents for intelligent information extraction by using domain knowledge and token-based morphological patterns. In: Lee, J.-H., Barley, M.W. (eds.) PRIMA 2003. LNCS (LNAI), vol. 2891, pp. 74–85. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, J., Kim, TH., Choi, J. (2005). An Interface Agent for Wrapper-Based Information Extraction. In: Barley, M.W., Kasabov, N. (eds) Intelligent Agents and Multi-Agent Systems. PRIMA 2004. Lecture Notes in Computer Science(), vol 3371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32128-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32128-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25340-2

  • Online ISBN: 978-3-540-32128-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics