Information extraction from calls for papers with conditional random fields and layout features

Schneider, Karl-Michael

doi:10.1007/s10462-007-9019-4

Information extraction from calls for papers with conditional random fields and layout features

Published: 18 August 2007

Volume 25, pages 67–77, (2006)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Karl-Michael Schneider¹

117 Accesses
9 Citations
Explore all metrics

Abstract

For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates, locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes, domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical structure, improving average F1 by 30% in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Reference String Extraction Using Line-Based Conditional Random Fields: A Case Study with German Language Publications

Automatically Identify and Label Sections in Scientific Journals Using Conditional Random Fields

Two-Tier Machine Learning Using Conditional Random Fields with Constraints

References

Carvalho VR, Cohen WW (2004) Learning to extract signature and reply lines from email. In: Proceedings of the first conference on email and anti-spam (CEAS), Mountain View, CA, 2004
Cox C, Nicolson J, Finkel JR, Manning C, Langley P (2005) Template sampling for leveraging domain knowledge in information extraction. In: PASCAL Challenges Workshop, Southampton, UK, 2005
Della Pietra S, Della Pietra VJ and Lafferty J (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4): 380–393
Article Google Scholar
Hurst M, Nasukawa T (2000) Layout and language: integrating spatial and linguistic knowledge for layout understanding tasks. In: Proceedings of the 18th international conference on computational linguistics (COLING’00), Saarbrücken, Germany, pp 334–340
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML-2001), Morgan Kaufmann, San Francisco, CA pp 282–289
Lin W, Yangarber R, Grishman R (2003) Bootstrapped learning of semantic classes from positive and negative examples. In: proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data, Washington, DC, pp 103–110
Malouf R (2002) A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the sixth conference on natural language learning (CoNLL-2002), Taipei, Taiwan, pp 49–55
McCallum A, Freitag D, Pereira F (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th international conference on machine learning (ICML-2000), Morgan Kaufmann, San Francisco, CA, pp 591–598
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu/
Peng F, McCallum A (2004) Accurate information extraction from research papers using conditional random fields. In: Proceedings of HLT-NAACL 2004, Boston, Massachusetts, pp 329–336
Pinto D, McCallum A, Wei X, Croft WB (2003) Table extraction using conditional random fields. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2003), Toronto, Canada, pp 235–242
Ramshaw LA, Marcus MP (1995) Text chunking using transformation-based learning. In: Proceedings ACL third workshop on very large corpora. Association of Computationa Linguistics, pp 82–94
Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence and 11th conference on innovative applications of artificial intelligence (AAAI/IAAI 1999), Orlando, Florida, pp 474–479, AAAI Press
Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (JNLPBA), Geneva, Switzerland, pp 104–107
Sha F, Pereira FCN (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp 134–141

Download references

Author information

Authors and Affiliations

Textkernel B.V., Amsterdam, The Netherlands
Karl-Michael Schneider

Authors

Karl-Michael Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl-Michael Schneider.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schneider, KM. Information extraction from calls for papers with conditional random fields and layout features. Artif Intell Rev 25, 67–77 (2006). https://doi.org/10.1007/s10462-007-9019-4

Download citation

Published: 18 August 2007
Issue Date: April 2006
DOI: https://doi.org/10.1007/s10462-007-9019-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information extraction from calls for papers with conditional random fields and layout features

Abstract

Access this article

Similar content being viewed by others

Evaluating Reference String Extraction Using Line-Based Conditional Random Fields: A Case Study with German Language Publications

Automatically Identify and Label Sections in Scientific Journals Using Conditional Random Fields

Two-Tier Machine Learning Using Conditional Random Fields with Constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information extraction from calls for papers with conditional random fields and layout features

Abstract

Access this article

Similar content being viewed by others

Evaluating Reference String Extraction Using Line-Based Conditional Random Fields: A Case Study with German Language Publications

Automatically Identify and Label Sections in Scientific Journals Using Conditional Random Fields

Two-Tier Machine Learning Using Conditional Random Fields with Constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation