Abstract
Information extraction helps in building advanced tools for text processing applications such as electronic publishing and information retrieval. The applications differ in their requirements on the input text, output information, and available resources for information extraction. Since existing IE technologies are application specific, extensive expert work is required to meet the needs of each new application. However, today, most users do not have this expertise and thus need a tool to easily create IE systems tailored to their needs. We introduce a framework that consists of (1) an extensible set of advanced IE technologies together with a description of the properties of their input, output, and resources, and (2) a generator that selects the relevant technologies for a specific application and integrates these into an IE system. A prototype of the framework is presented, and the generation of IE systems for two example applications is illustrated. The results are presented as guidelines for further development of the framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Proceedings of the 7th Message Understanding Conference (MUC-7). 1998. http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html.
Johan Aberg and Nahid Shahmehri. An Empirical Study of Human Web Assistants: Implications for User Support in Web Information Systems. In Proceedings of the CHI Conference on Human Factors in Computing Systems, pages 404–411, 2001.
Cécile Boisson-Aberg. Applying Similarity Measures for Management of Textual Templates. In Proceedings of the Student Workshop of the 38th Annual Meeting of the Association for Computational Linguistics, pages 463–473, 2000.
Cécile Boisson-Aberg and Nahid Shahmehri. Template Generation for Identifying Text Patterns. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, pages 8–15, 2000.
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI), 1998.
ETAI. Electronic Transactions on Artificial Intelligence. http://www.ida.liu.se/ext/etai/, 1997.
R. Gaizauskas, T. Wakao, K. Humphreys, H. Cunningham, and Y. Wilks. Description of LaSIE System as used for MUC-6. In Proceedings of the 6th Message Understanding Conference (MUC-6), pages 207–220, 1995.
R. Gaizauskas and Y. Wilks. Natural Language Information Retrieval, chapter 8. LaSIE Jumps the GATE. Kluwer Academic: Berlin, 1999.
Tao Guan and Kam-Fai Wong. KPS: a Web Information Mining Algorithm. In Proceedings of the 8th International WorldWideWeb Conference (WWW8), pages 417–429, 1999.
Jerry R. Hobbs. Generic Information Extraction System. In Proceedings of the 5th Message Understanding Conference (MUC-5), 1993.
K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, and Y. Wilks. Description of the University of Sheffield LaSIE-II System as used for MUC-7. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1998.
Henry Lieberman, Bonnie A. Nardi, and David Wright. Training Agents to Recognize Text by Example. In Proceedings of the 3rd International Conference on Autonomous Agents, pages 116–122, 1999.
Bonnie A. Nardi, James R. Miller, and David J. Wright. Collaborative, Programmable Intelligent Agents. Communication of the ACM, 41(3):96–104, 1998.
Milind S. Pandit and Sameer Kalbag. The Selection Recognition Agent: Instant Access to Relevant Information and Operations. In Proceedings of the International Conference on Intelligent User Interfaces, pages 47–52, 1997.
Ellen Riloff. Automatically Generating Extraction Patterns from Untagged Text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), pages 1044–1049, 1996.
Ellen Riloff and Rosie Jones. Learning Dictionaries for Information Extraction by Multi-Level Boostrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI), 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aberg, C., Shahmehri, N. (2002). A Framework for Generating Task Specific Information Extraction Systems. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_53
Download citation
DOI: https://doi.org/10.1007/3-540-48050-1_53
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43785-7
Online ISBN: 978-3-540-48050-1
eBook Packages: Springer Book Archive