Abstract
Information agents are emerging as an important approach to building next- generation value-added information services. An information agent is a distributed system that receives a goal through its user interface, gathers information relevant to this goal from a variety of sources, processes this content as appropriate,and delivers the results to the users. We focus on the second stage in this generic architecture. We survey a variety of information extraction techniques that enable information agents to automatically gather information from heterogeneous sources.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A high-performqance learning name-finder. In: Proc. Conf. on Applied Natural Language Processing (1997)
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proc. SIGMOD Workshop on Databases and the Web (1998)
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. The VLDB Journal, 109–118 (2001)
Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of the Seventh National Conference on Artificial, Austin, Texas, July 30-August 3, pp. 577–583 (2000)
Freitag, D., McCallum, A.: Information Extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventh National Conference on Artificial, Austin, Texas, July 30-August 3 (2000)
Hsu, C., Dung, M.: Generating finite-state transducers for semistructured data extraction from the web. J. Information Systems 23(8), 521–538 (1998)
Jensen, L., Cohen, W.: Grouping extracted fields. In: Proc. IJCAI 2001 Workshop on Adaptive Text Extraction and Mining (2001)
Kushmerick, N.: Wrapper Induction for Information Extraction. PhD thesis, University of Washington (1997)
Kushmerick, N.: Regression testing for wrapper maintenance. In: Proc. National Conference on Artificial Intelligence, pp. 74–79 (1999)
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Kushmerick, N.: Wrapper verification. World Wide Web Journal 3(2), 79–94 (2000)
Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: Pollack, M.E. (ed.) Fifteenth International Joint Conference on Artificial Intelligence, Japan, August 1997, vol. 1, pp. 729–735 (1997)
Leek, T.: Information extraction using hidden Markov models. Master’ thesis, University of California, San Diego (1997)
Lerman, K., Minton, S.: Learning the common structure of data. In: Proc. National Conference on Artificial Intelligence (2000)
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. Third International Conference on Autonomous Agents, pp. 190–197 (1999)
Muslea, I., Minton, S., Knoblock, C.: Selective sampling with redundant views. In: Proc. National Conference on Artificial Intelligence (2000)
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kushmerick, N. (2003). Finite-State Approaches to Web Information Extraction. In: Pazienza, M.T. (eds) Information Extraction in the Web Era. SCIE 2002. Lecture Notes in Computer Science(), vol 2700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45092-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-45092-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40579-5
Online ISBN: 978-3-540-45092-4
eBook Packages: Springer Book Archive