Skip to main content

Finite-State Approaches to Web Information Extraction

  • Conference paper
Information Extraction in the Web Era (SCIE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2700))

Included in the following conference series:

Abstract

Information agents are emerging as an important approach to building next- generation value-added information services. An information agent is a distributed system that receives a goal through its user interface, gathers information relevant to this goal from a variety of sources, processes this content as appropriate,and delivers the results to the users. We focus on the second stage in this generic architecture. We survey a variety of information extraction techniques that enable information agents to automatically gather information from heterogeneous sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A high-performqance learning name-finder. In: Proc. Conf. on Applied Natural Language Processing (1997)

    Google Scholar 

  2. Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proc. SIGMOD Workshop on Databases and the Web (1998)

    Google Scholar 

  3. Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. The VLDB Journal, 109–118 (2001)

    Google Scholar 

  4. Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of the Seventh National Conference on Artificial, Austin, Texas, July 30-August 3, pp. 577–583 (2000)

    Google Scholar 

  5. Freitag, D., McCallum, A.: Information Extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventh National Conference on Artificial, Austin, Texas, July 30-August 3 (2000)

    Google Scholar 

  6. Hsu, C., Dung, M.: Generating finite-state transducers for semistructured data extraction from the web. J. Information Systems 23(8), 521–538 (1998)

    Google Scholar 

  7. Jensen, L., Cohen, W.: Grouping extracted fields. In: Proc. IJCAI 2001 Workshop on Adaptive Text Extraction and Mining (2001)

    Google Scholar 

  8. Kushmerick, N.: Wrapper Induction for Information Extraction. PhD thesis, University of Washington (1997)

    Google Scholar 

  9. Kushmerick, N.: Regression testing for wrapper maintenance. In: Proc. National Conference on Artificial Intelligence, pp. 74–79 (1999)

    Google Scholar 

  10. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)

    Article  MathSciNet  Google Scholar 

  11. Kushmerick, N.: Wrapper verification. World Wide Web Journal 3(2), 79–94 (2000)

    Article  Google Scholar 

  12. Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: Pollack, M.E. (ed.) Fifteenth International Joint Conference on Artificial Intelligence, Japan, August 1997, vol. 1, pp. 729–735 (1997)

    Google Scholar 

  13. Leek, T.: Information extraction using hidden Markov models. Master’ thesis, University of California, San Diego (1997)

    Google Scholar 

  14. Lerman, K., Minton, S.: Learning the common structure of data. In: Proc. National Conference on Artificial Intelligence (2000)

    Google Scholar 

  15. Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

  16. Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. Third International Conference on Autonomous Agents, pp. 190–197 (1999)

    Google Scholar 

  17. Muslea, I., Minton, S., Knoblock, C.: Selective sampling with redundant views. In: Proc. National Conference on Artificial Intelligence (2000)

    Google Scholar 

  18. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

  19. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kushmerick, N. (2003). Finite-State Approaches to Web Information Extraction. In: Pazienza, M.T. (eds) Information Extraction in the Web Era. SCIE 2002. Lecture Notes in Computer Science(), vol 2700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45092-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45092-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40579-5

  • Online ISBN: 978-3-540-45092-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics