Abstract
For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates, locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes, domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical structure, improving average F1 by 30% in our experiments.
Similar content being viewed by others
References
Carvalho VR, Cohen WW (2004) Learning to extract signature and reply lines from email. In: Proceedings of the first conference on email and anti-spam (CEAS), Mountain View, CA, 2004
Cox C, Nicolson J, Finkel JR, Manning C, Langley P (2005) Template sampling for leveraging domain knowledge in information extraction. In: PASCAL Challenges Workshop, Southampton, UK, 2005
Della Pietra S, Della Pietra VJ and Lafferty J (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4): 380–393
Hurst M, Nasukawa T (2000) Layout and language: integrating spatial and linguistic knowledge for layout understanding tasks. In: Proceedings of the 18th international conference on computational linguistics (COLING’00), Saarbrücken, Germany, pp 334–340
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML-2001), Morgan Kaufmann, San Francisco, CA pp 282–289
Lin W, Yangarber R, Grishman R (2003) Bootstrapped learning of semantic classes from positive and negative examples. In: proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data, Washington, DC, pp 103–110
Malouf R (2002) A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the sixth conference on natural language learning (CoNLL-2002), Taipei, Taiwan, pp 49–55
McCallum A, Freitag D, Pereira F (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th international conference on machine learning (ICML-2000), Morgan Kaufmann, San Francisco, CA, pp 591–598
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu/
Peng F, McCallum A (2004) Accurate information extraction from research papers using conditional random fields. In: Proceedings of HLT-NAACL 2004, Boston, Massachusetts, pp 329–336
Pinto D, McCallum A, Wei X, Croft WB (2003) Table extraction using conditional random fields. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2003), Toronto, Canada, pp 235–242
Ramshaw LA, Marcus MP (1995) Text chunking using transformation-based learning. In: Proceedings ACL third workshop on very large corpora. Association of Computationa Linguistics, pp 82–94
Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence and 11th conference on innovative applications of artificial intelligence (AAAI/IAAI 1999), Orlando, Florida, pp 474–479, AAAI Press
Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (JNLPBA), Geneva, Switzerland, pp 104–107
Sha F, Pereira FCN (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp 134–141
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schneider, KM. Information extraction from calls for papers with conditional random fields and layout features. Artif Intell Rev 25, 67–77 (2006). https://doi.org/10.1007/s10462-007-9019-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-007-9019-4