Indirect Supervised Learning of Content Selection Logic

Duboue, Pablo A.

doi:10.1007/978-3-540-27823-8_5

Indirect Supervised Learning of Content Selection Logic

Pablo A. Duboue²¹

Conference paper

469 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Abstract

I investigate the automatic acquisition of Content Selection (CS) rules; a desirable goal, as the CS problem is quite domain dependent. My learning uses a loosely aligned Text-Data corpus, a resource increasingly popular in learning for NLG because they are readily available and do not require expensive hand labelling. However, they only provide indirect information about the selected or not selected status of each semantic datum. Indirect Supervised Learning is my proposed solution to this problem, a solution common to other learning from loosely aligned Text-Data corpora problems in NLG. It has two steps; in the first step, the loosely aligned Text-Data corpus is transformed into a data set with classification labels. In the second step, supervised learning machinery acquires the CS rules from this data set. I evaluate the approach by comparing the output of my system with the information selected by human authors in unseen texts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cox, R., O’Donnell, M., Oberlander, J.: Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelling explorer. In: Proc. of AI-ED 1999 (1999)
Google Scholar
Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proc. EMNLP, Sapporo, Japan (2003)
Google Scholar
Kim, S., Alani, H., Hall, W., Lewis, P., Millard, D., Shadbolt, N., Weal, M.: Artequakt: Generating tailored biographies with automatically annotated fragments from the web. In: Proc. of the Semantic Authoring, Annotation and Knowledge Markup Workshop in the 15th European Conf. on Artificial Intelligence (2002)
Google Scholar
Schiffman, B., Mani, I., Conception, K.: Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In: Proc. of ACL-EACL (2001)
Google Scholar
Radev, D., McKeown, K.R.: Building a generation knowledge source using internet-accessible newswire. In: Proc. of the 5th ANLP (1997)
Google Scholar
Teich, E., Bateman, J.A.: Towards an application of text generation in an integrated publication system. In: Proc. of 7th IWNLG (1994)
Google Scholar
Duboue, P.A., McKeown, K.R.: ProGenIE: Biographical descriptions for intelligence analysis. In: Proc. 1st Symp. on Intelligence and Security Informatics, Tucson, AZ, Springer, Heidelberg (2003)
Google Scholar
Knott, A., O’Donnell, M., Oberlander, J., Mellish, C.: Defeasible rules in content selection and text structuring. In: Proc. of EWNLG, Duisburg, Germany (1997)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5, 59–68 (2003)
Article Google Scholar
Sripada, S.G., Reiter, E., Hunter, J., Yu, J.: A two-stage model for content determination. In: ACL-EWNLG 2001, Toulouse, France, pp. 3–10 (2001)
Google Scholar
Bontcheva, K., Wilks, Y.: Dealing with dependencies between content planning and surface realisation in a pipeline generation architecture. In: Proc. IJCAI (2001)
Google Scholar
Reiter, E., Robertson, R., Osman, L.: Knowledge acquisition for natural language generation. In: Proc. of INLG 2000 (2000)
Google Scholar
Lester, J., Porter, B.: Developing and empirically evaluating robust explanation generators: The knight experiments. Comp. Ling. (1997)
Google Scholar
Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. Technical report, Columbia Univ. CS Dept. (2003)
Google Scholar
Nenkova, A., Passoneau, R.: Evaluating content selection in summarization: The pyramid method. In: Proc. of HLT-NAACL, Boston, MA (2004)
Google Scholar
Duboue, P.A., McKeown, K.R.: Content planner construction via evolutionary algorithms and a corpus-based fitness function. In: Proc. of INLG (2002)
Google Scholar
Barzilay, R., Lee, L.: Bootstrapping lexical choice via multiple-sequence alignment. In: EMNLP 2002, Philadelphia, PA (2002)
Google Scholar
Sripada, S., Reiter, E., Hunter, J., Yu, J.: Exploiting a parallel text-data corpus. In: Proceedings of Corpus Linguistics 2003 (2003)
Google Scholar
Workshop on Learning Word Meaning from Non-Linguistic Data. In: Barzilay, R., Reiter, E., Siskind, J.M. (eds.) HLTNAACL03, Edmonton, Canada ACL (2003)
Google Scholar
Kashima, H., Inokuchi, A.: Kernels for graph classification. In: Proc. of Int.Workshop on Active Mining, pp. 31–35 (2002)
Google Scholar
Niu, C., Li, W., Ding, Jihong Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proc. of HLT-NAACL, Edmonton, Canada (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Columbia University, New York, NY, 10025, USA
Pablo A. Duboue

Authors

Pablo A. Duboue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology Research Institute, University of Brighton, Lewes Road, BN2 4GJ, Brighton, UK
Anja Belz
University of Brighton, Brighton, UK
Roger Evans
NLG Group, Centre for Research in Computing, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Paul Piwek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duboue, P.A. (2004). Indirect Supervised Learning of Content Selection Logic. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-27823-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22340-5
Online ISBN: 978-3-540-27823-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics