Skip to main content
Log in

Information extraction from calls for papers with conditional random fields and layout features

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates, locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes, domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical structure, improving average F1 by 30% in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Carvalho VR, Cohen WW (2004) Learning to extract signature and reply lines from email. In: Proceedings of the first conference on email and anti-spam (CEAS), Mountain View, CA, 2004

  2. Cox C, Nicolson J, Finkel JR, Manning C, Langley P (2005) Template sampling for leveraging domain knowledge in information extraction. In: PASCAL Challenges Workshop, Southampton, UK, 2005

  3. Della Pietra S, Della Pietra VJ and Lafferty J (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4): 380–393

    Article  Google Scholar 

  4. Hurst M, Nasukawa T (2000) Layout and language: integrating spatial and linguistic knowledge for layout understanding tasks. In: Proceedings of the 18th international conference on computational linguistics (COLING’00), Saarbrücken, Germany, pp 334–340

  5. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML-2001), Morgan Kaufmann, San Francisco, CA pp 282–289

  6. Lin W, Yangarber R, Grishman R (2003) Bootstrapped learning of semantic classes from positive and negative examples. In: proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data, Washington, DC, pp 103–110

  7. Malouf R (2002) A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the sixth conference on natural language learning (CoNLL-2002), Taipei, Taiwan, pp 49–55

  8. McCallum A, Freitag D, Pereira F (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th international conference on machine learning (ICML-2000), Morgan Kaufmann, San Francisco, CA, pp 591–598

  9. McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu/

  10. Peng F, McCallum A (2004) Accurate information extraction from research papers using conditional random fields. In: Proceedings of HLT-NAACL 2004, Boston, Massachusetts, pp 329–336

  11. Pinto D, McCallum A, Wei X, Croft WB (2003) Table extraction using conditional random fields. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2003), Toronto, Canada, pp 235–242

  12. Ramshaw LA, Marcus MP (1995) Text chunking using transformation-based learning. In: Proceedings ACL third workshop on very large corpora. Association of Computationa Linguistics, pp 82–94

  13. Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence and 11th conference on innovative applications of artificial intelligence (AAAI/IAAI 1999), Orlando, Florida, pp 474–479, AAAI Press

  14. Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (JNLPBA), Geneva, Switzerland, pp 104–107

  15. Sha F, Pereira FCN (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp 134–141

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl-Michael Schneider.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schneider, KM. Information extraction from calls for papers with conditional random fields and layout features. Artif Intell Rev 25, 67–77 (2006). https://doi.org/10.1007/s10462-007-9019-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-007-9019-4

Keywords

Navigation