Abstract
We describe an application of information extraction for building a directory of announcements of scientific conferences. We employ a cascaded finite-state transducer to identify possible conference names, titles, dates, locations and URLs in a conference announcement. In order to cope with agrammatical text that is typical for conference announcements, our system uses orthographic features of the text and a domain-specific tag set, rather than general purpose part-of-speech tags. Extraction accuracy is improved by recognizing other entities in the text that are not extracted but could be confused with slot values. A scoring scheme based on some simple heuristics is used to select among multiple extraction candidates. We also present an evaluation of our system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing. MIT Press, Cambridge (1997)
Grishman, R.: Information extraction: Techniques and challenges. In: Pazienza, M.T. (ed.) Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology. LNCS (LNAI), vol. 1299, pp. 10–27. Springer, Heidelberg (1997)
Grefenstette, G., Tapanainen, P.: What is a word, what is a sentence? Problems of tokenization. In: 3rd International Conference on Computational Lexicography (COMPLEX 1994), Budapest, pp. 79–87 (1994)
Kruger, A., Giles, C.L., Coetzee, F.M., Glover, E., Flake, G.W., Lawrence, S., Omlin, C.: DEADLINER: Building a new niche search engine. In: Proc. Ninth International Conference on Information and Knowledge Management (CIKM 2000), Washington, DC (2000)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: A machine learning approach to building domain-specific search engines. In: 16th International Joint Conference on Artificial Intelligence, IJCAI 1999 (1999)
Freitag, D.: Machine learning for information extraction in informal domains. Machine Learning 39, 169–202 (2000)
Abney, S.: Partial parsing via finite-state cascades. In: ESSLLI 1996 Workshop on Robust Parsing, Prague, pp. 8–15 (1996)
Schiller, A.: Multilingual finite-state noun phrase extraction. In: Proc. ECAI 1996 Workshop on Extended Finite State Models of Language (1996)
Friburger, N., Maurel, D.: Finite-state transducer cascade to extract proper names in texts. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 115–124. Springer, Heidelberg (2003)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proc. 17th International Conference on Machine Learning (ICML 2000), pp. 591–598. Morgan Kaufmann, San Francisco (2000)
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Chieu, H.L., Ng, H.T.: A maximum entropy approach to information extraction from semi-structured and free text. In: Proc. 18th National Conference on Artificial Intelligence (AAAI 2002), Edmonton, pp. 786–791 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schneider, KM. (2004). Using Information Extraction to Build a Directory of Conference Announcements. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_65
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive